NumpyBasics: Difference between revisions

From CASA Guides
Jump to navigationJump to search
Line 12: Line 12:
===Importing NumPy===
===Importing NumPy===


NumPy is not part of the basic python distribution. It and even some
NumPy is not part of the basic python distribution. If you are not using CASA then you will need to [http://www.scipy.org/ download and install numpy]. CASA includes numpy as pat of its basic distribution but keeps it as a separate module (similar to the "os" or "math" modules in the basic python distribution).
parts of the basic distribution are kept as separate modules. Assuming
 
that your paths are all set up right, you can
Once it is correctly installed, you can import numpy and gain access to its functionality in the following way.
import numpy and gain access to its functionality in the following way.


<source lang="Python">
<source lang="Python">
Line 21: Line 20:
</source>
</source>


And now we can begin to use its functions like:
Note that this imports numpy as "np". Had we just typed "import numpy" you would have access to the functionality but substituting "numpy" for "np" throughout.
 
Now we can use numpy functions like so:
 
<source lang="Python">
<source lang="Python">
print( np.arange(10) )
print( np.arange(10) )
</source>
</source>


And just to see how these arrays work, type:
"arange" is analogous to the basic python "range" but generates a numpy array.
 
Just to see how these arrays work, type:
 
<source lang="Python">
<source lang="Python">
print( np.arange(10) + 5 )
print( np.arange(10) + 5 )
</source>
</source>
Notice how the 5 is broadcasted to each element in the array.
 
Notice how the 5 is '''broadcast''' to each element in the array.
Similarly, we can write a bunch of powers of 2 by:
Similarly, we can write a bunch of powers of 2 by:


Line 36: Line 42:
print( 2**np.arange(10) )
print( 2**np.arange(10) )
</source>
</source>


===Your First Numpy Array===
===Your First Numpy Array===

Revision as of 15:47, 1 November 2011

Back to the PythonOverview.

Introduction to NumPy

The built in python collections leave something to be desired for serious astronomical number crunching. Almost all current astronomical data processing inside python takes advantage of the numpy libraries. These are third party routines packaged with CASA that allow efficient definition and manipulation of matrices (or images or data cubes). This CASAguide will show you how to import this key module, build an array, do basic math, and cleverly access the contents of the array.

An overriding concern to keep in mind as you explore numpy is that numpy is fast and python is slow. What we mean is that the same operation using one of numpy's built in (C) functions will run orders of magnitude faster than using python to carry out the same operation via looping. If you have worked with arrays in IDL, MatLab, or a similar language before this approach should be familiar.

Importing NumPy

NumPy is not part of the basic python distribution. If you are not using CASA then you will need to download and install numpy. CASA includes numpy as pat of its basic distribution but keeps it as a separate module (similar to the "os" or "math" modules in the basic python distribution).

Once it is correctly installed, you can import numpy and gain access to its functionality in the following way.

import numpy as np

Note that this imports numpy as "np". Had we just typed "import numpy" you would have access to the functionality but substituting "numpy" for "np" throughout.

Now we can use numpy functions like so:

print( np.arange(10) )

"arange" is analogous to the basic python "range" but generates a numpy array.

Just to see how these arrays work, type:

print( np.arange(10) + 5 )

Notice how the 5 is broadcast to each element in the array. Similarly, we can write a bunch of powers of 2 by:

print( 2**np.arange(10) )

Your First Numpy Array

Before we get going, type np. and hit teh tab key. You might be a bit freaked out by the large list of 563 possibilites, but it's good to know that's there. Also remember ?np.median will bring you up the same content as help np.median but with some useful header info.

Now lets make a simple numpy array and figure out how to get its basic properties. We can do this by hand like so:

ra = np.array([[0,1,2],[3,4,5]])
print(ra)

Or use the arange function to make the same array

ra = np.arange(6).reshape(2,3)
print(ra)

Let's get some basic stats. Again it may be useful to type ra. and tap first.

The shape of the array

ra.shape

Its total number of elements

ra.size

The number of dimensions

ra.ndim

Notice that we're not using () here, these are attributes of the array, ndim and size are ints, shape is a tuple, as we can see from:

type(ra.shape)

Now get the data type

ra.dtype

and notice that numpy has a slightly different nomenclature for the data types.

Also notice that some of the loosey-goosey casting is gone. Let's try to shove a float into our integer array.

print(ra)
ra[0,0] = 5.75
print(ra)

It got floored and entered as in int.

Now let's do something with our array, say square every element

ra**2

Or add it to itself

ra + ra

Notice it's doing element-by-element manipulation.

5 * ra

Array Creation

Now that we've seen some array basics, let's look a bit closer at how we can make them. There's a lot here, we'll just look at a couple of approaches.

We already saw the by-hand approach

ra = np.array([[0,1,2],[3,4,5]])
print(ra)
ra = np.array([0,1,2,3,4,5])
print(ra)

See how the square brackets control the shape? If you forget them entirely you'll get an unfortunate error.

We also saw the arange approach, which generates a sequential array of integers. It takes stop, start, and step as arguments like so:

ra = np.arange(5,10,1)
print(ra)
ra = np.arange(5,10,3)
print(ra)

It stops at the last element under 10

ra = np.arange(5,10.1,1.0)
print(ra)

See how the float step made the array a float?

ra = np.arange(5,10.1,1.0)
print(ra)

We can also give an explicity type to most array creation functions like so:

ra = np.arange(0,10,1,dtype=np.float32)
print(ra)

Otherwise it would have been an int

And there are many ways of doing similar things, a more stable float version of arange is linspace:

ra = np.linspace(1.0,2.0,11)
print(ra)

Which covers 1.0 to 2.0 inclusive with 11 elements.

We can also make arrays full of ones or zeros just specifying the size:

ra = np.zeros((3,3))
print(ra)
ra = np.zeros((3,3,3),dtype=np.bool)
print(ra)

And we can mimic one array with another

ara = np.zeros((3,3))
print(ara)
bra = np.zeros_like(ara)
print(bra)
bra = np.ones_like(ara)
print(bra)

Numpy will also implictly create arrays from array math:

ara = np.ones((3,3))
bra = ara + ara
print(bra)
cra = (ara == ara)
print(cra)

That Mutable Immutable Stuff

You need to be a bit careful copying arrays. If you simply do this

dra = ara

... it does NOT make a new array.

In detail try this:

test = np.array([1,2,3,4,5])
b = test
print(b)
test[2] = 6
print(b)

Notice how by changing test, we also change b!

In the above examples, dra just points at ara and b just points at test. The two in fact share the same data, so that changing one changes the other. You can check this via:

dra is ara
b is test

This gets a bit complicated but we would have been okay with:

b = test.copy()

or

dra = ara*1.0

so just be aware that there can be referencing issues with the sense that if you do not explicitly copy an array.

Math With Numpy

Slicing and Iteration With Numpy