NumpyBasics

From CASA Guides
Jump to navigationJump to search

Back to the PythonOverview.

Introduction to NumPy

The built in python collections leave something to be desired for serious astronomical number crunching. Almost all current astronomical data processing inside python takes advantage of the numpy libraries. These are third party routines packaged with CASA that allow efficient definition and manipulation of matrices (or images or data cubes). This CASAguide will show you how to import this key module, build an array, do basic math, and cleverly access the contents of the array.

An overriding concern to keep in mind as you explore numpy is that numpy is fast and python is slow. What we mean is that the same operation using one of numpy's built in (C) functions will run orders of magnitude faster than using python to carry out the same operation via looping. If you have worked with arrays in IDL, MatLab, or a similar language before this approach should be familiar.

Importing NumPy

NumPy is not part of the basic python distribution. If you are not using CASA then you will need to download and install numpy. CASA includes numpy as pat of its basic distribution but keeps it as a separate module (similar to the "os" or "math" modules in the basic python distribution).

Once it is correctly installed, you can import numpy and gain access to its functionality in the following way.

import numpy as np

Note that this imports numpy as "np". Had we just typed "import numpy" you would have access to the functionality but substituting "numpy" for "np" throughout.

Now we can use numpy functions like so:

print( np.arange(10) )

"arange" is analogous to the basic python "range" but generates a numpy array.

Just to see how these arrays work, type:

print( np.arange(10) + 5 )

Notice how the 5 is broadcast to each element in the array. Similarly, we can write a bunch of powers of 2 by:

print( 2**np.arange(10) )

Your First Numpy Array

Before we get going, type np. and hit the tab key. You might be a bit freaked out by the large list (~560 possibilites at time of writing), but it's good to know that's there. Also remember that you can get help on each of these functions. Use ?np.median to bring up the same content as help np.median but with some useful header info.

First, let's make a simple numpy array and figure out how to get its basic properties. We can do this by hand like so:

ra = np.array([[0,1,2],[3,4,5]])
print(ra)

Or use the arange function and the reshape method to make the same array like so:

ra = np.arange(6).reshape(2,3)
print(ra)

We now have an array stored in the variable "ra". Let's get its basic properties. Again it may be useful to type ra. and tap <tab> first.

Get the shape of the array like so:

ra.shape

And its total number of elements like so:

ra.size

Its number of dimensions:

ra.ndim

Notice that we're not using () here, these are attributes of the array, ndim and size are ints, shape is a tuple, as we can see from:

type(ra.shape)

Now get the data type of our array

ra.dtype

Notice that numpy has a slightly different nomenclature for its data types than baseline python.

Also notice that some of the loosey-goosey casting from basic python is gone. Let's try to shove a float into our integer array.

print(ra)
ra[0,0] = 5.75
print(ra)

Notice that it got floored and entered as in int.

Now let's do something with our array, say square every element

ra**2

Or add it to itself

ra + ra

Notice it's doing element-by-element manipulation, the aforementioned broadcasting.

5 * ra

Array Creation

Now that we've seen some array basics, let's look a bit closer at how we can make them. There's a lot here, we'll just look at a couple of approaches.

We already saw the by-hand approach

ra = np.array([[0,1,2],[3,4,5]])
print(ra)
ra = np.array([0,1,2,3,4,5])
print(ra)

See how the square brackets control the shape? If you forget them entirely you'll get an unfortunate error.

We also saw the arange approach, which generates a sequential array of integers. It takes stop, start, and step as arguments like so:

ra = np.arange(5,10,1)
print(ra)
ra = np.arange(5,10,3)
print(ra)

It stops at the last element under 10

ra = np.arange(5,10.1,1.0)
print(ra)

See how the float step made the array a float?

ra = np.arange(5,10.1,1.0)
print(ra)

We can also give an explicity type to most array creation functions like so:

ra = np.arange(0,10,1,dtype=np.float32)
print(ra)

Otherwise it would have been an int

And there are many ways of doing similar things, a more stable float version of arange is linspace:

ra = np.linspace(1.0,2.0,11)
print(ra)

Which covers 1.0 to 2.0 inclusive with 11 elements.

We can also make arrays full of ones or zeros just specifying the size:

ra = np.zeros((3,3))
print(ra)
ra = np.zeros((3,3,3),dtype=np.bool)
print(ra)

And we can mimic one array with another

ara = np.zeros((3,3))
print(ara)
bra = np.zeros_like(ara)
print(bra)
bra = np.ones_like(ara)
print(bra)

Numpy will also implictly create arrays from array math:

ara = np.ones((3,3))
bra = ara + ara
print(bra)
cra = (ara == ara)
print(cra)

Copies and Views: That Mutable/Immutable Stuff

Numpy arrays are objects and as a result you need to be a bit careful copying them. If you simply do this

dra = ara

it does not make a new array.

In detail try this:

test = np.array([1,2,3,4,5])
b = test
print(b)
test[2] = 6
print(b)

Notice how by changing test, we also change b!

In the above examples, dra just points at ara and b just points at test. The two in fact share the same data, so that changing one changes the other. You can check this via:

dra is ara
b is test

This gets a bit complicated but we would have been okay with:

b = test.copy()

or

dra = ara*1.0

so just be aware that there can be referencing issues with the sense that if you do not explicitly copy an array then you can end up with two views of the same array and not two truly independent arrays.

Math With Numpy

Slicing and Iteration With Numpy