NumpyBasics: Difference between revisions
(28 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
'''Back to the [[PythonOverview]].''' | |||
==Introduction to NumPy== | ==Introduction to NumPy== | ||
The built in python collections leave something to be desired for | The built in python collections leave something to be desired for | ||
serious | serious astronomical number crunching. Almost all current astronomical data | ||
processing inside python takes advantage of the numpy | processing inside python takes advantage of the '''numpy''' | ||
libraries. | libraries. These are third party routines packaged with CASA that allow efficient definition and manipulation of matrices (or images or data cubes). This CASAguide will show you how to import this key module, build an array, do basic math, and cleverly access the contents of the array. | ||
array | |||
An overriding concern to keep in mind as you explore numpy is that ''numpy is fast'' and ''python is slow''. What we mean is that the same operation using one of numpy's built in (C) functions will run orders of magnitude faster than using python to carry out the same operation via looping. If you have worked with arrays in IDL, MatLab, or a similar language before this approach should be familiar. | |||
===Importing NumPy=== | ===Importing NumPy=== | ||
NumPy is not part of the basic python distribution. | NumPy is not part of the basic python distribution. If you are not using CASA then you will need to [http://www.scipy.org/ download and install numpy]. CASA includes numpy as pat of its basic distribution but keeps it as a separate module (similar to the "os" or "math" modules in the basic python distribution). | ||
Once it is correctly installed, you can import numpy and gain access to its functionality in the following way. | |||
import numpy and gain access to its functionality in the following way. | |||
<source lang="Python"> | <source lang="Python"> | ||
Line 21: | Line 20: | ||
</source> | </source> | ||
Note that this imports numpy as "np". Had we just typed "import numpy" you would have access to the functionality but substituting "numpy" for "np" throughout. | |||
Now we can use numpy functions like so: | |||
<source lang="Python"> | <source lang="Python"> | ||
print( np.arange(10) ) | print( np.arange(10) ) | ||
</source> | </source> | ||
"arange" is analogous to the basic python "range" but generates a numpy array. | |||
Just to see how these arrays work, type: | |||
<source lang="Python"> | <source lang="Python"> | ||
print( np.arange(10) + 5 ) | print( np.arange(10) + 5 ) | ||
</source> | </source> | ||
Notice how the 5 is | |||
Notice how the 5 is '''broadcast''' to each element in the array. | |||
Similarly, we can write a bunch of powers of 2 by: | Similarly, we can write a bunch of powers of 2 by: | ||
Line 36: | Line 42: | ||
print( 2**np.arange(10) ) | print( 2**np.arange(10) ) | ||
</source> | </source> | ||
===Your First Numpy Array=== | ===Your First Numpy Array=== | ||
Before we get going, type '''np.''' and hit | Before we get going, type '''np.''' and hit the tab key. You | ||
might be a bit freaked out by the large list of | might be a bit freaked out by the large list (~560 possibilites at time of writing), but it's good to know | ||
that's there. Also remember '''?np.median''' | that's there. Also remember that you can get help on each of these functions. Use '''?np.median''' to bring up the same | ||
content as '''help np.median''' but with some useful header info. | content as '''help np.median''' but with some useful header info. | ||
First, let's make a simple numpy array and figure out how to get its basic properties. We can do this by hand like so: | |||
basic properties. We can do this by hand like so: | |||
<source lang="Python"> | <source lang="Python"> | ||
ra = np.array([[0,1,2],[3,4,5]]) | ra = np.array([[0,1,2],[3,4,5]]) | ||
Line 52: | Line 57: | ||
</source> | </source> | ||
Or use the arange function to make the same array | Or use the '''arange''' function and the '''reshape''' method to make the same array like so: | ||
<source lang="Python"> | <source lang="Python"> | ||
ra = np.arange(6).reshape(2,3) | ra = np.arange(6).reshape(2,3) | ||
Line 58: | Line 64: | ||
</source> | </source> | ||
Let's get | We now have an array stored in the variable "ra". Let's get its basic properties. Again it may be useful to type ra. and tap <tab> first. | ||
Get the shape of the array like so: | |||
<source lang="Python"> | <source lang="Python"> | ||
ra.shape | ra.shape | ||
</source> | </source> | ||
And its total number of elements like so: | |||
<source lang="Python"> | <source lang="Python"> | ||
ra.size | ra.size | ||
</source> | </source> | ||
Its number of dimensions: | |||
<source lang="Python"> | <source lang="Python"> | ||
ra.ndim | ra.ndim | ||
Line 78: | Line 86: | ||
type(ra.shape) | type(ra.shape) | ||
</source> | </source> | ||
Now get the data type | |||
Now get the data type of our array | |||
<source lang="Python"> | <source lang="Python"> | ||
ra.dtype | ra.dtype | ||
</source> | </source> | ||
Also notice that some of the loosey-goosey casting is gone. Let's | Notice that numpy has a slightly different nomenclature for its data types than baseline python. | ||
try to shove a float into our integer array. | |||
Also notice that some of the loosey-goosey casting from basic python is gone. Let's try to shove a float into our integer array. | |||
<source lang="Python"> | <source lang="Python"> | ||
print(ra) | print(ra) | ||
Line 92: | Line 101: | ||
print(ra) | print(ra) | ||
</source> | </source> | ||
Notice that it got floored and entered as in int. | |||
Now let's do something with our array, say square every element | Now let's do something with our array, say square every element | ||
Line 103: | Line 113: | ||
</source> | </source> | ||
Notice it's doing element-by-element manipulation. | Notice it's doing ''element-by-element'' manipulation, the aforementioned '''broadcasting'''. | ||
<source lang="Python"> | <source lang="Python"> | ||
5 * ra | 5 * ra | ||
Line 193: | Line 203: | ||
</source> | </source> | ||
Of course, what we really want is to shove our astronomical data into a numpy array. We'll see several ways to do this in subsequent guides. For now just keep in mind that these 3 x 3 arrays of floats are like 4096 x 4096 CCD images or 3- or 4-dimensional radio data cubes. | |||
===Copies and Views: That Mutable/Immutable Stuff=== | |||
Numpy arrays are objects and as a result you need to be a bit careful copying them. If you simply do this | |||
<source lang="Python"> | <source lang="Python"> | ||
Line 201: | Line 213: | ||
</source> | </source> | ||
it does ''not'' make a new array. | |||
In detail try this: | In detail try this: | ||
Line 234: | Line 246: | ||
so just be aware that there can be referencing issues with the sense | so just be aware that there can be referencing issues with the sense | ||
that if you do not explicitly copy an array. | that if you do not explicitly copy an array then you can end up with two views of the same array and not two truly independent arrays. | ||
== Math With Numpy == | |||
Here we'll see do basic math with numpy arrays. As above, we need to be sure to have numpy imported. Bring it in as np to make the examples below work. | |||
<source lang="Python"> | |||
import numpy as np | |||
</source> | |||
=== Basic Math === | |||
Numpy gives you the ability to efficiently and compactly do math on whole arrays. By default this is the element-wise manipulation that we saw earlier: | |||
<source lang="Python"> | |||
ara = np.arange(10).reshape(5,2) | |||
print ara | |||
</source> | |||
<source lang="Python"> | |||
bra = ara**2 | |||
print bra | |||
</source> | |||
<source lang="Python"> | |||
bra = ara*3.0 | |||
print bra | |||
</source> | |||
<source lang="Python"> | |||
cra = ara + bra | |||
print cra | |||
</source> | |||
Note that to combine arrays in this way, you need them to have the same shape, not just the same number of elements. Because ara is two dimensional this doesn't work: | |||
<source lang="Python"> | |||
dra = np.arange(10) | |||
print ara + dra | |||
</source> | |||
But you can reshape one array to match the other if they have the same number of elements | |||
<source lang="Python"> | |||
print ara + dra.reshape(ara.shape) | |||
</source> | |||
Though of course this only works if you have the same number of elements to start with. | |||
=== Fancier Operations === | |||
Numpy can do more complex element-wise operations, for example trigonometric calculations: | |||
<source lang="Python"> | |||
from math import pi | |||
ara = np.arange(11)/10.*2.0*pi | |||
print ara | |||
print np.sin(ara) | |||
print np.cos(ara) | |||
</source> | |||
But note that you do need to be using the ''numpy'' not the ''math'' trigonometric functions. Math chokes on numpy arrays. For example try: | |||
<source lang="Python"> | |||
import math | |||
ara = np.arange(11)/10.*2.0*pi | |||
print math.sin(ara) | |||
</source> | |||
Not good! There is a way around this, using the the "map" functionality to compactly operate on each element of '''ara''': | |||
<source lang="Python"> | |||
bra = np.array(map(math.sin, ara)) | |||
bra | |||
print bra == np.sin(ara) | |||
</source> | |||
But this isn't ideal, it's still python-speed not numpy-speed. So do use the numpy functions when available. | |||
=== Collapsing arrays === | |||
Numpy also has functions to also "collapse" arrays in useful ways. For example, take the sum of all elements: | |||
<source lang="Python"> | |||
ara = np.arange(11) | |||
print np.sum(ara) | |||
</source> | |||
or the median value: | |||
<source lang="Python"> | |||
print np.median(ara) | |||
</source> | |||
You can take the mininum and maximum like so: | |||
<source lang="Python"> | |||
print np.min(ara), np.max(ara) | |||
</source> | |||
These collapsing functions can also operate selectively along one or more axes. For example, take the world's simplest data cube (z, y, x): | |||
<source lang="Python"> | |||
ara = np.arange(50).reshape(2,5,5) | |||
</source> | |||
Collapse it along the first axis (e.g., to make a moment map): | |||
<source lang="Python"> | |||
im = np.sum(ara,axis=0) | |||
print im.shape | |||
</source> | |||
Get the maximum along the first axis (e.g., to make a peak intensity map): | |||
<source lang="Python"> | |||
im = np.max(ara,axis=0) | |||
print im.shape | |||
</source> | |||
To collapse the cube along the second and third axes (e.g., to make a spectrum) you can use two successive sums: | |||
<source lang="Python"> | |||
spec = np.sum(np.sum(ara,axis=2),axis=1) | |||
print spec | |||
</source> | |||
Or you can use the '''apply_over_axes''' function, which exists for exactly this purpose: | |||
<source lang="Python"> | |||
spec2 = np.apply_over_axes(np.sum,ara,[1,2]) | |||
print spec2 | |||
</source> | |||
Note that '''apply_over_axes''' preserves the number dimensions of the original so for a totally equivalent call you'd want: | |||
<source lang="Python"> | |||
spec2 = (np.apply_over_axes(np.sum,ara,[1,2])).flatten() | |||
</source> | |||
The '''flatten''' makes the array one-d. The complementary function here is '''apply_along_axis''' | |||
Numpy can also do cumulative sums for you via | |||
<source lang="Python"> | |||
ara = np.arange(11) | |||
print ara | |||
print np.cumsum(ara) | |||
</source> | |||
=== Masking and Element-wise Logic === | |||
Truth statements also operate element-wise in numpy. For example try | |||
<source lang="Python"> | |||
ara = np.arange(11) | |||
print ara > 5 | |||
</source> | |||
<source lang="Python"> | |||
bra = np.arange(11)*2.0 - 5.0 | |||
print ara > bra | |||
</source> | |||
You can see that you get an element-wise comparison. | |||
This is very useful for data analysis. A common radio astronomy trick is to make a "mask" that considers only part of an image, defined subject by some condition. Here we mask "ara" where it is less than "bra": | |||
<source lang="Python"> | |||
mask = ara > bra | |||
</source> | |||
and we now have an array of Trues and Falses called mask. Booleans multiply as ones and zeros, so that multiply a mask times an array gives zero for values that are "False" in the mask: | |||
<source lang="Python"> | |||
print ara*mask | |||
</source> | |||
=== Linear Algebra === | |||
You have probably noticed that so far we aren't really working with matrices. We're working with n-dimensional data sets and broadcasting operations to each datum. Numpy also includes linear algebra functionality. The simplest example is a vector dot product: | |||
<source lang="Python"> | |||
avec = [0, 1, 2] | |||
bvec = [1, 3, 0] | |||
print np.dot(avec, bvec) | |||
</source> | |||
The dot function in fact does general matrix multiplication. So to multiply a matrix by a vector: | |||
<source lang="Python"> | |||
ara = [[1,3],[2,2]] | |||
vec = [1,4] | |||
print np.dot(ara,vec) | |||
</source> | |||
Or to multiply two matrices together: | |||
<source lang="Python"> | |||
ara = [[1,3],[2,2]] | |||
bra = [[1,2],[4,0]] | |||
print np.dot(ara,bra) | |||
</source> | |||
It's similarly simple to transpose a matrix (and note that transpose is very important for you mental health as you combine numpy and CASA images): | |||
<source lang="Python"> | |||
ara = np.array([[0,1],[2,3]]) | |||
print np.transpose(ara) | |||
</source> | |||
You can also take the inverse or the trace: | |||
<source lang="Python"> | |||
print ara | |||
print np.invert(ara) | |||
print np.trace(ara) | |||
</source> | |||
You can solve systems of linear equations. To solve this system of equations: | |||
<math> 3 x + y = 9 </math> | |||
<math> x + 2 y = 8 </math> | |||
you would call '''solve''' inside the linear algebra sub-module like so: | |||
<source lang="Python"> | |||
mat = np.array([[3, 1],[1,2]]) | |||
vec = np.array([9,8]) | |||
print np.linalg.solve(mat,vec) | |||
</source> | |||
== Slicing and Iteration With Numpy == | |||
Getting at parts of the array in clever ways can make the difference between lightning-fast code and taking forever to do anything (either because you have to write a million lines of code or because the program is slow). Numpy has very powerful slicing and iteration capabilities. Again make sure that we have numpy as imported as np before stepping through these examples: | |||
<source lang="Python"> | |||
import numpy as np | |||
</source> | |||
=== Slicing === | |||
(''Note that unless you copy explicitly, slicing just points - returning 'views'. Use .copy() to get a copy.'') | |||
The most basic slicing is that you can just index an array using square brackets to get the value at a particular point. Let's define a basic 5 by 5 array and have a look at different ways to access the elements. | |||
<source lang="Python"> | |||
ara = np.arange(25).reshape(5,5) | |||
print ara | |||
print ara[0,0] | |||
print ara[1,0] | |||
print ara[0,2] | |||
</source> | |||
(''Also note that last index refers to the fastest moving part of the array. This is important to keep in mind.'') | |||
The syntax '''y:x''' gets you elements y through x, excluding x. ''':x''' means "everything until x" so that | |||
<source lang="Python"> | |||
print ara[:3,0] | |||
</source> | |||
prints elements 0, 1, and 2 in column 0. Meanwhile just a ''':''' gets everything so that | |||
<source lang="Python"> | |||
print ara[:,0] | |||
</source> | |||
prints the whole column. Try | |||
<source lang="Python"> | |||
print ara[0,:] | |||
print ara[0] | |||
</source> | |||
and joining three numbers by colons, '''y:x:z''', gives you y through x not including x and stepping by z each time. So, switching to a one-d array here, we have: | |||
<source lang="Python"> | |||
bra = np.arange(16) | |||
print bra | |||
print bra[4:12] | |||
print bra[4:12:2] | |||
</source> | |||
print elements 4-12 and 4-12 stepping by twos. Remember that you never get that last element (we saw this with lists already)! | |||
Similar to lists, you can do negative indexing from the end of the column or row. | |||
<source lang="Python"> | |||
print ara | |||
print ara[-1,0] | |||
</source> | |||
You can leave off the later axes of an array and it will implictly place ''':''' in these. If you just want to get at the last axis of an array, you can use the ellipsis. For example, define a 3 dimensional array and try to index the last spot: | |||
<source lang="Python"> | |||
ara = np.arange(125).reshape(5,5,5) | |||
print ara[...,0] | |||
print ara[:,:,0] | |||
</source> | |||
the last two are the same. | |||
===Indexing With Arrays=== | |||
You can also index arrays with other arrays. For example: | |||
<source lang="Python"> | |||
ara = np.arange(15)*2 | |||
bra = np.array([1,5,12]) | |||
print ara[bra] | |||
</source> | |||
but note that the results may not be intuitive for higher dimensional arrays. For example, this: | |||
<source lang="Python"> | |||
ara = np.arange(25).reshape(5,5) | |||
bra = np.array([[1,2],[1,3]]) | |||
print ara | |||
print bra | |||
print ara[bra] | |||
</source> | |||
is filling in an array of size bra with rows pulled from ara. | |||
that is you get bra[0,0] set to hold ara[1]: | |||
<source lang="Python"> | |||
print bra[0,0] | |||
print ara[1] | |||
</source> | |||
which is a 1-d array and so the whole thing grows by a dimension | |||
<source lang="Python"> | |||
print ara[bra].shape | |||
</source> | |||
this can be useful but requires some careful thinking for high-dimension arrays. | |||
Possibly more intuitive, you can index an array with a tuple of lists (or a tuple of arrays): | |||
<source lang="Python"> | |||
ara = np.arange(25).reshape(5,5) | |||
ind = ([0,1,2], [0,3,4]) | |||
</source> | |||
and it does what you might expect, pairing the first element of each list to give you a series of values at these coordinates as a one-d numpy array. | |||
<source lang="Python"> | |||
print ara[ind] | |||
print ara[0,0], ara[1,3], ara[2,4] | |||
</source> | |||
Another trick is to index with a matched-size boolean array. This will return the values from the places where the indexing array was set to True. | |||
<source lang="Python"> | |||
ara = np.arange(25).reshape(5,5) | |||
bra = ara > 15 | |||
print bra | |||
print ara[bra] | |||
</source> | |||
This is also potentially very useful, especially in combination with astronomical masking and/or the indices function (see below). | |||
=== Finding the Elements of an Array That Meet Some Condition === | |||
There are a few really indispensible more advanced tricks for image processing stuff. First and foremost, you need to be able to define some condition and extract the elements of an array where that condition is met. At the most basic level, this means that you want to get back an array of coordinates where some user-defined condition is "True". IDL does this with "where", numpy does it most directly with the '''.nonzero()''' method (or indexing via boolean arrays). ".nonzero()" gives you back the set of indices where the array calling the method is not zero (remember that False is zero). So: | |||
<source lang="Python"> | |||
ara = np.arange(25).reshape(5,5) | |||
ara = ara*(ara > 15) | |||
ind = ara.nonzero() | |||
print ara, ind | |||
print ara[ind] | |||
</source> | |||
You can skip the middle step and just call .nonzero() on the (ara > 15) ... remember that this has made a matched-size boolean array. | |||
<source lang="Python"> | |||
ara = np.arange(25).reshape(5,5) | |||
ind = (ara > 15).nonzero() | |||
print ara, ind | |||
print ara[ind] | |||
</source> | |||
This is your IDL "where." | |||
Numpy's actual '''where''' command is a little different. It lets you do lightning-fast search and replaces. The statement: | |||
<source lang="Python"> | |||
ara = np.arange(25).reshape(5,5) | |||
bra = np.where(ara > 15, ara, 100) | |||
</source> | |||
tells numpy: "where ara > 15 use ara else use 100" and the result is returned as a new array. This new array looks like: | |||
<source lang="Python"> | |||
print bra | |||
</source> | |||
=== Indices === | |||
A final useful bit of functionality for handling data cubes is that the '''indices''' function generates arrays of indices to match a specific array shape. You call it on the shape of the array: | |||
<source lang="Python"> | |||
ara = np.ones((5,5,3)) | |||
ind = np.indices(ara.shape) | |||
</source> | |||
and notice that it has an extra dimension compared to ara. This axis is ndim long and tells you which axis you are pulling the index from (e.g., ind[0,x,y,z] is the index of x,y,z along the first dimension ind[1,x,y,z] is the index along the second dimension, and so on). So: | |||
<source lang="Python"> | |||
print ind.shape | |||
</source> | |||
And this gives the index along the first dimension: | |||
<source lang="Python"> | |||
print ind[0], ind[0].shape | |||
</source> | |||
This gives the index along second dimension: | |||
<source lang="Python"> | |||
print ind[1], ind[1].shape | |||
</source> | |||
and so on: | |||
<source lang="Python"> | |||
print ind[2], ind[2].shape | |||
</source> | |||
You could, at the cost of memory, more or less completely replicate .nonzero() with indices and boolean array indexing. |
Latest revision as of 22:51, 1 November 2011
Back to the PythonOverview.
Introduction to NumPy
The built in python collections leave something to be desired for serious astronomical number crunching. Almost all current astronomical data processing inside python takes advantage of the numpy libraries. These are third party routines packaged with CASA that allow efficient definition and manipulation of matrices (or images or data cubes). This CASAguide will show you how to import this key module, build an array, do basic math, and cleverly access the contents of the array.
An overriding concern to keep in mind as you explore numpy is that numpy is fast and python is slow. What we mean is that the same operation using one of numpy's built in (C) functions will run orders of magnitude faster than using python to carry out the same operation via looping. If you have worked with arrays in IDL, MatLab, or a similar language before this approach should be familiar.
Importing NumPy
NumPy is not part of the basic python distribution. If you are not using CASA then you will need to download and install numpy. CASA includes numpy as pat of its basic distribution but keeps it as a separate module (similar to the "os" or "math" modules in the basic python distribution).
Once it is correctly installed, you can import numpy and gain access to its functionality in the following way.
import numpy as np
Note that this imports numpy as "np". Had we just typed "import numpy" you would have access to the functionality but substituting "numpy" for "np" throughout.
Now we can use numpy functions like so:
print( np.arange(10) )
"arange" is analogous to the basic python "range" but generates a numpy array.
Just to see how these arrays work, type:
print( np.arange(10) + 5 )
Notice how the 5 is broadcast to each element in the array. Similarly, we can write a bunch of powers of 2 by:
print( 2**np.arange(10) )
Your First Numpy Array
Before we get going, type np. and hit the tab key. You might be a bit freaked out by the large list (~560 possibilites at time of writing), but it's good to know that's there. Also remember that you can get help on each of these functions. Use ?np.median to bring up the same content as help np.median but with some useful header info.
First, let's make a simple numpy array and figure out how to get its basic properties. We can do this by hand like so:
ra = np.array([[0,1,2],[3,4,5]])
print(ra)
Or use the arange function and the reshape method to make the same array like so:
ra = np.arange(6).reshape(2,3)
print(ra)
We now have an array stored in the variable "ra". Let's get its basic properties. Again it may be useful to type ra. and tap <tab> first.
Get the shape of the array like so:
ra.shape
And its total number of elements like so:
ra.size
Its number of dimensions:
ra.ndim
Notice that we're not using () here, these are attributes of the array, ndim and size are ints, shape is a tuple, as we can see from:
type(ra.shape)
Now get the data type of our array
ra.dtype
Notice that numpy has a slightly different nomenclature for its data types than baseline python.
Also notice that some of the loosey-goosey casting from basic python is gone. Let's try to shove a float into our integer array.
print(ra)
ra[0,0] = 5.75
print(ra)
Notice that it got floored and entered as in int.
Now let's do something with our array, say square every element
ra**2
Or add it to itself
ra + ra
Notice it's doing element-by-element manipulation, the aforementioned broadcasting.
5 * ra
Array Creation
Now that we've seen some array basics, let's look a bit closer at how we can make them. There's a lot here, we'll just look at a couple of approaches.
We already saw the by-hand approach
ra = np.array([[0,1,2],[3,4,5]])
print(ra)
ra = np.array([0,1,2,3,4,5])
print(ra)
See how the square brackets control the shape? If you forget them entirely you'll get an unfortunate error.
We also saw the arange approach, which generates a sequential array of integers. It takes stop, start, and step as arguments like so:
ra = np.arange(5,10,1)
print(ra)
ra = np.arange(5,10,3)
print(ra)
It stops at the last element under 10
ra = np.arange(5,10.1,1.0)
print(ra)
See how the float step made the array a float?
ra = np.arange(5,10.1,1.0)
print(ra)
We can also give an explicity type to most array creation functions like so:
ra = np.arange(0,10,1,dtype=np.float32)
print(ra)
Otherwise it would have been an int
And there are many ways of doing similar things, a more stable float version of arange is linspace:
ra = np.linspace(1.0,2.0,11)
print(ra)
Which covers 1.0 to 2.0 inclusive with 11 elements.
We can also make arrays full of ones or zeros just specifying the size:
ra = np.zeros((3,3))
print(ra)
ra = np.zeros((3,3,3),dtype=np.bool)
print(ra)
And we can mimic one array with another
ara = np.zeros((3,3))
print(ara)
bra = np.zeros_like(ara)
print(bra)
bra = np.ones_like(ara)
print(bra)
Numpy will also implictly create arrays from array math:
ara = np.ones((3,3))
bra = ara + ara
print(bra)
cra = (ara == ara)
print(cra)
Of course, what we really want is to shove our astronomical data into a numpy array. We'll see several ways to do this in subsequent guides. For now just keep in mind that these 3 x 3 arrays of floats are like 4096 x 4096 CCD images or 3- or 4-dimensional radio data cubes.
Copies and Views: That Mutable/Immutable Stuff
Numpy arrays are objects and as a result you need to be a bit careful copying them. If you simply do this
dra = ara
it does not make a new array.
In detail try this:
test = np.array([1,2,3,4,5])
b = test
print(b)
test[2] = 6
print(b)
Notice how by changing test, we also change b!
In the above examples, dra just points at ara and b just points at test. The two in fact share the same data, so that changing one changes the other. You can check this via:
dra is ara
b is test
This gets a bit complicated but we would have been okay with:
b = test.copy()
or
dra = ara*1.0
so just be aware that there can be referencing issues with the sense that if you do not explicitly copy an array then you can end up with two views of the same array and not two truly independent arrays.
Math With Numpy
Here we'll see do basic math with numpy arrays. As above, we need to be sure to have numpy imported. Bring it in as np to make the examples below work.
import numpy as np
Basic Math
Numpy gives you the ability to efficiently and compactly do math on whole arrays. By default this is the element-wise manipulation that we saw earlier:
ara = np.arange(10).reshape(5,2)
print ara
bra = ara**2
print bra
bra = ara*3.0
print bra
cra = ara + bra
print cra
Note that to combine arrays in this way, you need them to have the same shape, not just the same number of elements. Because ara is two dimensional this doesn't work:
dra = np.arange(10)
print ara + dra
But you can reshape one array to match the other if they have the same number of elements
print ara + dra.reshape(ara.shape)
Though of course this only works if you have the same number of elements to start with.
Fancier Operations
Numpy can do more complex element-wise operations, for example trigonometric calculations:
from math import pi
ara = np.arange(11)/10.*2.0*pi
print ara
print np.sin(ara)
print np.cos(ara)
But note that you do need to be using the numpy not the math trigonometric functions. Math chokes on numpy arrays. For example try:
import math
ara = np.arange(11)/10.*2.0*pi
print math.sin(ara)
Not good! There is a way around this, using the the "map" functionality to compactly operate on each element of ara:
bra = np.array(map(math.sin, ara))
bra
print bra == np.sin(ara)
But this isn't ideal, it's still python-speed not numpy-speed. So do use the numpy functions when available.
Collapsing arrays
Numpy also has functions to also "collapse" arrays in useful ways. For example, take the sum of all elements:
ara = np.arange(11)
print np.sum(ara)
or the median value:
print np.median(ara)
You can take the mininum and maximum like so:
print np.min(ara), np.max(ara)
These collapsing functions can also operate selectively along one or more axes. For example, take the world's simplest data cube (z, y, x):
ara = np.arange(50).reshape(2,5,5)
Collapse it along the first axis (e.g., to make a moment map):
im = np.sum(ara,axis=0)
print im.shape
Get the maximum along the first axis (e.g., to make a peak intensity map):
im = np.max(ara,axis=0)
print im.shape
To collapse the cube along the second and third axes (e.g., to make a spectrum) you can use two successive sums:
spec = np.sum(np.sum(ara,axis=2),axis=1)
print spec
Or you can use the apply_over_axes function, which exists for exactly this purpose:
spec2 = np.apply_over_axes(np.sum,ara,[1,2])
print spec2
Note that apply_over_axes preserves the number dimensions of the original so for a totally equivalent call you'd want:
spec2 = (np.apply_over_axes(np.sum,ara,[1,2])).flatten()
The flatten makes the array one-d. The complementary function here is apply_along_axis
Numpy can also do cumulative sums for you via
ara = np.arange(11)
print ara
print np.cumsum(ara)
Masking and Element-wise Logic
Truth statements also operate element-wise in numpy. For example try
ara = np.arange(11)
print ara > 5
bra = np.arange(11)*2.0 - 5.0
print ara > bra
You can see that you get an element-wise comparison.
This is very useful for data analysis. A common radio astronomy trick is to make a "mask" that considers only part of an image, defined subject by some condition. Here we mask "ara" where it is less than "bra":
mask = ara > bra
and we now have an array of Trues and Falses called mask. Booleans multiply as ones and zeros, so that multiply a mask times an array gives zero for values that are "False" in the mask:
print ara*mask
Linear Algebra
You have probably noticed that so far we aren't really working with matrices. We're working with n-dimensional data sets and broadcasting operations to each datum. Numpy also includes linear algebra functionality. The simplest example is a vector dot product:
avec = [0, 1, 2]
bvec = [1, 3, 0]
print np.dot(avec, bvec)
The dot function in fact does general matrix multiplication. So to multiply a matrix by a vector:
ara = [[1,3],[2,2]]
vec = [1,4]
print np.dot(ara,vec)
Or to multiply two matrices together:
ara = [[1,3],[2,2]]
bra = [[1,2],[4,0]]
print np.dot(ara,bra)
It's similarly simple to transpose a matrix (and note that transpose is very important for you mental health as you combine numpy and CASA images):
ara = np.array([[0,1],[2,3]])
print np.transpose(ara)
You can also take the inverse or the trace:
print ara
print np.invert(ara)
print np.trace(ara)
You can solve systems of linear equations. To solve this system of equations:
[math]\displaystyle{ 3 x + y = 9 }[/math]
[math]\displaystyle{ x + 2 y = 8 }[/math]
you would call solve inside the linear algebra sub-module like so:
mat = np.array([[3, 1],[1,2]])
vec = np.array([9,8])
print np.linalg.solve(mat,vec)
Slicing and Iteration With Numpy
Getting at parts of the array in clever ways can make the difference between lightning-fast code and taking forever to do anything (either because you have to write a million lines of code or because the program is slow). Numpy has very powerful slicing and iteration capabilities. Again make sure that we have numpy as imported as np before stepping through these examples:
import numpy as np
Slicing
(Note that unless you copy explicitly, slicing just points - returning 'views'. Use .copy() to get a copy.)
The most basic slicing is that you can just index an array using square brackets to get the value at a particular point. Let's define a basic 5 by 5 array and have a look at different ways to access the elements.
ara = np.arange(25).reshape(5,5)
print ara
print ara[0,0]
print ara[1,0]
print ara[0,2]
(Also note that last index refers to the fastest moving part of the array. This is important to keep in mind.)
The syntax y:x gets you elements y through x, excluding x. :x means "everything until x" so that
print ara[:3,0]
prints elements 0, 1, and 2 in column 0. Meanwhile just a : gets everything so that
print ara[:,0]
prints the whole column. Try
print ara[0,:]
print ara[0]
and joining three numbers by colons, y:x:z, gives you y through x not including x and stepping by z each time. So, switching to a one-d array here, we have:
bra = np.arange(16)
print bra
print bra[4:12]
print bra[4:12:2]
print elements 4-12 and 4-12 stepping by twos. Remember that you never get that last element (we saw this with lists already)!
Similar to lists, you can do negative indexing from the end of the column or row.
print ara
print ara[-1,0]
You can leave off the later axes of an array and it will implictly place : in these. If you just want to get at the last axis of an array, you can use the ellipsis. For example, define a 3 dimensional array and try to index the last spot:
ara = np.arange(125).reshape(5,5,5)
print ara[...,0]
print ara[:,:,0]
the last two are the same.
Indexing With Arrays
You can also index arrays with other arrays. For example:
ara = np.arange(15)*2
bra = np.array([1,5,12])
print ara[bra]
but note that the results may not be intuitive for higher dimensional arrays. For example, this:
ara = np.arange(25).reshape(5,5)
bra = np.array([[1,2],[1,3]])
print ara
print bra
print ara[bra]
is filling in an array of size bra with rows pulled from ara.
that is you get bra[0,0] set to hold ara[1]:
print bra[0,0]
print ara[1]
which is a 1-d array and so the whole thing grows by a dimension
print ara[bra].shape
this can be useful but requires some careful thinking for high-dimension arrays.
Possibly more intuitive, you can index an array with a tuple of lists (or a tuple of arrays):
ara = np.arange(25).reshape(5,5)
ind = ([0,1,2], [0,3,4])
and it does what you might expect, pairing the first element of each list to give you a series of values at these coordinates as a one-d numpy array.
print ara[ind]
print ara[0,0], ara[1,3], ara[2,4]
Another trick is to index with a matched-size boolean array. This will return the values from the places where the indexing array was set to True.
ara = np.arange(25).reshape(5,5)
bra = ara > 15
print bra
print ara[bra]
This is also potentially very useful, especially in combination with astronomical masking and/or the indices function (see below).
Finding the Elements of an Array That Meet Some Condition
There are a few really indispensible more advanced tricks for image processing stuff. First and foremost, you need to be able to define some condition and extract the elements of an array where that condition is met. At the most basic level, this means that you want to get back an array of coordinates where some user-defined condition is "True". IDL does this with "where", numpy does it most directly with the .nonzero() method (or indexing via boolean arrays). ".nonzero()" gives you back the set of indices where the array calling the method is not zero (remember that False is zero). So:
ara = np.arange(25).reshape(5,5)
ara = ara*(ara > 15)
ind = ara.nonzero()
print ara, ind
print ara[ind]
You can skip the middle step and just call .nonzero() on the (ara > 15) ... remember that this has made a matched-size boolean array.
ara = np.arange(25).reshape(5,5)
ind = (ara > 15).nonzero()
print ara, ind
print ara[ind]
This is your IDL "where."
Numpy's actual where command is a little different. It lets you do lightning-fast search and replaces. The statement:
ara = np.arange(25).reshape(5,5)
bra = np.where(ara > 15, ara, 100)
tells numpy: "where ara > 15 use ara else use 100" and the result is returned as a new array. This new array looks like:
print bra
Indices
A final useful bit of functionality for handling data cubes is that the indices function generates arrays of indices to match a specific array shape. You call it on the shape of the array:
ara = np.ones((5,5,3))
ind = np.indices(ara.shape)
and notice that it has an extra dimension compared to ara. This axis is ndim long and tells you which axis you are pulling the index from (e.g., ind[0,x,y,z] is the index of x,y,z along the first dimension ind[1,x,y,z] is the index along the second dimension, and so on). So:
print ind.shape
And this gives the index along the first dimension:
print ind[0], ind[0].shape
This gives the index along second dimension:
print ind[1], ind[1].shape
and so on:
print ind[2], ind[2].shape
You could, at the cost of memory, more or less completely replicate .nonzero() with indices and boolean array indexing.