NumpyBasics: Difference between revisions

Revision as of 16:22, 1 November 2011

Introduction to NumPy

The built in python collections leave something to be desired for serious astronomical number crunching. Almost all current astronomical data processing inside python takes advantage of the numpy libraries. These are third party routines packaged with CASA that allow efficient definition and manipulation of matrices (or images or data cubes). This CASAguide will show you how to import this key module, build an array, do basic math, and cleverly access the contents of the array.

An overriding concern to keep in mind as you explore numpy is that numpy is fast and python is slow. What we mean is that the same operation using one of numpy's built in (C) functions will run orders of magnitude faster than using python to carry out the same operation via looping. If you have worked with arrays in IDL, MatLab, or a similar language before this approach should be familiar.

Importing NumPy

NumPy is not part of the basic python distribution. If you are not using CASA then you will need to download and install numpy. CASA includes numpy as pat of its basic distribution but keeps it as a separate module (similar to the "os" or "math" modules in the basic python distribution).

Once it is correctly installed, you can import numpy and gain access to its functionality in the following way.

import numpy as np

Note that this imports numpy as "np". Had we just typed "import numpy" you would have access to the functionality but substituting "numpy" for "np" throughout.

Now we can use numpy functions like so:

print( np.arange(10) )

"arange" is analogous to the basic python "range" but generates a numpy array.

Just to see how these arrays work, type:

print( np.arange(10) + 5 )

Notice how the 5 is broadcast to each element in the array. Similarly, we can write a bunch of powers of 2 by:

print( 2**np.arange(10) )

Your First Numpy Array

Before we get going, type np. and hit the tab key. You might be a bit freaked out by the large list (~560 possibilites at time of writing), but it's good to know that's there. Also remember that you can get help on each of these functions. Use ?np.median to bring up the same content as help np.median but with some useful header info.

First, let's make a simple numpy array and figure out how to get its basic properties. We can do this by hand like so:

ra = np.array([[0,1,2],[3,4,5]])
print(ra)

Or use the arange function and the reshape method to make the same array like so:

ra = np.arange(6).reshape(2,3)
print(ra)

We now have an array stored in the variable "ra". Let's get its basic properties. Again it may be useful to type ra. and tap <tab> first.

Get the shape of the array like so:

ra.shape

And its total number of elements like so:

ra.size

Its number of dimensions:

ra.ndim

Notice that we're not using () here, these are attributes of the array, ndim and size are ints, shape is a tuple, as we can see from:

type(ra.shape)

Now get the data type of our array

ra.dtype

Notice that numpy has a slightly different nomenclature for its data types than baseline python.

Also notice that some of the loosey-goosey casting from basic python is gone. Let's try to shove a float into our integer array.

print(ra)
ra[0,0] = 5.75
print(ra)

Notice that it got floored and entered as in int.

Now let's do something with our array, say square every element

ra**2

Or add it to itself

ra + ra

Notice it's doing element-by-element manipulation, the aforementioned broadcasting.

5 * ra

Array Creation

Now that we've seen some array basics, let's look a bit closer at how we can make them. There's a lot here, we'll just look at a couple of approaches.

We already saw the by-hand approach

ra = np.array([[0,1,2],[3,4,5]])
print(ra)
ra = np.array([0,1,2,3,4,5])
print(ra)

See how the square brackets control the shape? If you forget them entirely you'll get an unfortunate error.

We also saw the arange approach, which generates a sequential array of integers. It takes stop, start, and step as arguments like so:

ra = np.arange(5,10,1)
print(ra)
ra = np.arange(5,10,3)
print(ra)

It stops at the last element under 10

ra = np.arange(5,10.1,1.0)
print(ra)

See how the float step made the array a float?

ra = np.arange(5,10.1,1.0)
print(ra)

We can also give an explicity type to most array creation functions like so:

ra = np.arange(0,10,1,dtype=np.float32)
print(ra)

Otherwise it would have been an int

And there are many ways of doing similar things, a more stable float version of arange is linspace:

ra = np.linspace(1.0,2.0,11)
print(ra)

Which covers 1.0 to 2.0 inclusive with 11 elements.

We can also make arrays full of ones or zeros just specifying the size:

ra = np.zeros((3,3))
print(ra)
ra = np.zeros((3,3,3),dtype=np.bool)
print(ra)

And we can mimic one array with another

ara = np.zeros((3,3))
print(ara)
bra = np.zeros_like(ara)
print(bra)
bra = np.ones_like(ara)
print(bra)

Numpy will also implictly create arrays from array math:

ara = np.ones((3,3))
bra = ara + ara
print(bra)
cra = (ara == ara)
print(cra)

Of course, what we really want is to shove our astronomical data into a numpy array. We'll see several ways to do this in subsequent guides. For now just keep in mind that these 3 x 3 arrays of floats are like 4096 x 4096 CCD images or 3- or 4-dimensional radio data cubes.

Copies and Views: That Mutable/Immutable Stuff

Numpy arrays are objects and as a result you need to be a bit careful copying them. If you simply do this

dra = ara

it does not make a new array.

In detail try this:

test = np.array([1,2,3,4,5])
b = test
print(b)
test[2] = 6
print(b)

Notice how by changing test, we also change b!

In the above examples, dra just points at ara and b just points at test. The two in fact share the same data, so that changing one changes the other. You can check this via:

dra is ara
b is test

This gets a bit complicated but we would have been okay with:

b = test.copy()

or

dra = ara*1.0

so just be aware that there can be referencing issues with the sense that if you do not explicitly copy an array then you can end up with two views of the same array and not two truly independent arrays.

Math With Numpy

Here we'll see do basic math with numpy arrays. As above, we need to be sure to have numpy imported. Bring it in as np to make the examples below work.

import numpy as np

Basic Math

Numpy gives you the ability to efficiently and compactly do math on whole arrays. By default this is the element-wise manipulation that we saw earlier:

ara = np.arange(10).reshape(5,2) print ara

bra = ara**2 print bra

bra = ara*3.0 print bra

cra = ara + bra print cra

to combine arrays in this way, you need them to have the same shape, not just the same number of elements.

dra = np.arange(10) print ara + dra

... but you can reshape one array to match the other if you have the same number of elements

print ara + dra.reshape(ara.shape)

... but note that this only works if you have the same number of elements to start with.

Fancier Operations

Numpy can do more complex element-wise operations

... for example trigonometric calculations:

from math import pi ara = np.arange(11)/10.*2.0*pi print ara print np.sin(ara) print np.cos(ara)

IMPORTANT: you need to be using the np functions, the math versions will choke on arrays. For example try:

import math ara = np.arange(11)/10.*2.0*pi print math.sin(ara)

but don't worry here too much. You can use the "map" functionality to compactly get around this:

bra = np.array(map(math.sin, ara)) bra print bra == np.sin(ara)

(though do use the numpy functions when available.)

Collapsing arrays

Numpy also has functions to also "collapse" arrays in useful ways

... for example take the sum

ara = np.arange(11) print np.sum(ara)

... the median

print np.median(ara)

... the mininum and maximum

print np.min(ara), np.max(ara)

These collapsing functions can also operate selectively along one or more axes

... for example take the world's simplest data cube (z, y, x)

ara = np.arange(50).reshape(2,5,5)

... collapse along the first axis (e.g., to make a moment map)

im = np.sum(ara,axis=0) print im.shape

... get the maximum along the first axis (e.g., to make a peak intensity map)

im = np.max(ara,axis=0) print im.shape

... collapse along the second and third axes (to make a spectrum)

spec = np.sum(np.sum(ara,axis=2),axis=1) print spec

... you can also do it this way:

spec2 = np.apply_over_axes(np.sum,ara,[1,2]) print spec2

... note that it preserves the number dimensions of the original so for a totally equivalent call you'd want:

spec2 = (np.apply_over_axes(np.sum,ara,[1,2])).flatten()

... flatten() makes the array one-d

... cumulative sums are also slick and frequently useful

ara = np.arange(11) print ara print np.cumsum(ara)

Masking and Element-wise Logic

Truth statements also operate elementwise. For example:

ara = np.arange(11) print ara > 5

bra = np.arange(11)*2.0 - 5.0 print ara > bra

... a common astronomy trick is to make a mask subject to some condition. Here we mask "ara" where it is less than "bra."

mask = ara > bra

... booleans multiply as ones and zeros

print ara*mask

There's a lot more here, especially the apply_over_axes() and apply_along_axis() could take a lot of our time. They offer a very general tool to write almost any operation you want. But let's move on for now.

Linear Algebra

You have have noticed that so far we aren't really working with matrices, we're working with n-dimensional data sets and broadcasting operations to each datum. Numpy also includes linear algebra functionality. You just have to ask for it.

The simplest example is a vector dot product:

avec = [0, 1, 2] bvec = [1, 3, 0] print np.dot(avec, bvec)

the dot function in fact does general matrix multiplication:

... to multiply a matrix by a vector:

ara = [[1,3],[2,2]] vec = [1,4] print np.dot(ara,vec)

... or two matrices

ara = [[1,3],[2,2]] bra = [[1,2],[4,0]] print np.dot(ara,bra)

It's simple to transpose a matrix:

ara = np.array([[0,1],[2,3]]) print np.transpose(ara)

... or take the inverse or the trace

print ara print np.inv(ara) print np.trace(ara)

You can solve systems of linear equations like so:

3 * x + y = 9 x + 2 * y = 8

mat = np.array([[3, 1],[1,2]]) vec = np.array([9,8]) print np.linalg.solve(mat,vec)

Slicing and Iteration With Numpy

Getting at parts of the array in clever ways can make the difference between lightning-fast code and taking forever to do anything (either because you have to write a million lines of code or because the program is slow). Numpy has very powerful slicing and iteration capabilities.

Make sure that we have numpy as imported as np

import numpy as np

Slicing

NOTE THAT UNLESS YOU COPY EXPLICITLY THESE JUST "POINT"! (like we saw before, they're "views"). Use .copy() to get a copy.

The most basic slicing is that you can just index an array to get the value at a particular point.

ara = np.arange(25).reshape(5,5) print ara print ara[0,0] print ara[1,0] print ara[0,2]

NOTICE! The last index refers to the fastest moving part of the array. This may (or may not) be unfamiliar. It's important to keep in mind.

The syntax :x means "everything until x" so that

print ara[:3,0]

prints elements 0, 1, and 2 in column 0

print ara[:,0]

prints the whole column

print ara[0,:] print ara[0]

and (switching to a one-d array here)

bra = np.arange(16) print bra print bra[4:12] print bra[4:12:2]

print elements 4-12 and 4-12 stepping by twos. Remember that you never get that last element!

Similar to lists, you can do negative indexing from the end of the column or row.

print ara print ara[-1,0]

And if you just want to get at the last index, you can use the ellipsis:

ara = np.arange(125).reshape(5,5,5) print ara[...,0] print ara[:,:,0]

the last two are the same.

Indexing With Arrays

You can index arrays with other arrays. For example:

ara = np.arange(15)*2 bra = np.array([1,5,12]) print ara[bra]

but note that this:

ara = np.arange(25).reshape(5,5) bra = np.array([[1,2],[1,3]]) print ara print bra print ara[bra]

is filling in an array of size bra with rows pulled from ara.

that is you get bra[0,0] set to hold ara[1]:

print bra[0,0] print ara[1]

which is a 1-d array and so the whole thing grows

print ara[bra].shape

... ugh.

Possibly more intuitive, you can index with a tuple of lists:

ara = np.arange(25).reshape(5,5) ind = ([0,1,2], [0,3,4])

print ara[ind] print ara[0,0], ara[1,3], ara[2,4]

And you can index with a matched-size boolean array and will get back the places with True.

ara = np.arange(25).reshape(5,5) bra = ara > 15 print bra print ara[bra]

... this is very very useful!

"where"

There are a few really indispensible more advanced tricks for image processing stuff. First, if you want a list of indices where the array is not zero:

ara = np.arange(25).reshape(5,5) ara = ara*(ara > 15) ind = ara.nonzero() print ara, ind print ara[ind]

skip the middle step

ara = np.arange(25).reshape(5,5) ind = (ara > 15).nonzero() print ara, ind print ara[ind]

this is your IDL "where" (though you can achieve a lot of similar functionality with the boolean array indexing mentioned above)

You can also do lightning search and replaces with the numpy where:

ara = np.arange(25).reshape(5,5) bra = np.where(ara > 15, ara, 100)

reads where ara > 15 use ara else use 100

print bra

indices

Very useful for astronomy (especially data-cubing), you can build cubes of indices to match a specific shape. Like so:

ara = np.ones((5,5,3)) ind = np.indices(ara.shape)

... notices that ind has an extra dimension ndim compared to ara ...

print ind.shape

... so now this is the index along the first dimension

print ind[0], ind[0].shape

... the second dimension

print ind[1], ind[1].shape

... and so on

print ind[2], ind[2].shape