PythonDataAccess: Difference between revisions

From CASA Guides
Jump to navigationJump to search
Aleroy (talk | contribs)
No edit summary
Jkeohane (talk | contribs)
No edit summary
 
(12 intermediate revisions by one other user not shown)
Line 1: Line 1:
'''Back to the [[PythonOverview]].'''
'''Back to the [[PythonOverview]].'''


We'll take a quick look at saving and loading files focusing on two approaches: reading and writing text files and saving variables via "pickling" (think IDL save/restore). Loosely related, we'll see how to accept input from the user.
== Preface ==


==Input==
In addition to manipulating your data, you need some way to save and access it. Here we look at saving and loading files. We'll have a look at how to save and load ASCII data from disk, then how to quickly save and load more complex collections using pickle, and access to astronomical FITS data (or at least CASA images) via CASA. We'll begin by looking at how to get input from the user.


Input can be accepted from the command line (or a script paused) using the raw input command.
== Input ==


Input can be accepted from the command line (or a script paused) using the '''raw_input''' command. Use this to query the user or to put a pause inside a script. "raw_input" returns the user input as its output:
<source lang="python">
verb = raw_input("Give me a verb: ")
verb = raw_input("Give me a verb: ")
noun = raw_input("Give me a noun: ")
noun = raw_input("Give me a noun: ")
</source>


and you can then use these as you would any other string variable:
<source lang="python">
mad_lib = "More fun than "+verb+"ing a "+noun
mad_lib = "More fun than "+verb+"ing a "+noun
print mad_lib
print mad_lib
</source>
Then inserting a pause in a script is as easy as:
<source lang="python">
dummy = raw_input("Hit <Enter> to continue.")
</source>


==File Access==
==ASCII Files in Basic Python==


Python provides easy basic file access. Grab our example_file.txt for the following example.
Python provides easy basic file access. Grab our [[File:example_file.txt]] for the following example.


Open a file like so:
Python lets you open a file with the '''open''' command, like so:


<source lang="python">
a_file = open("example_file.txt", "r")
a_file = open("example_file.txt", "r")
</source>


r - means read, w means write. You can do both at once if you want. Read up for more.
The second parameter determines how you will access the file. '''r''' means read, '''w''' means write, and '''a''' means append. You can both read and write at once if you want. Read up [http://docs.python.org/tutorial/inputoutput.html here] for more.


Now that it's open we can read the lines in the file into a list like so:
Now that it's open we can read the lines in the file into a list like so:


<source lang="python">
lines = a_file.readlines()
lines = a_file.readlines()
print lines
print lines
</source>


We could have read a single line with readline() or only a fixed set of bytes with read()
We could also have read a single line with readline() or only a fixed set of bytes with read().


Close the file
After we've written or read our data, we will want to close the file. Do this with the '''.close()''' method like so.


<source lang="python">
a_file.close()
a_file.close()
</source>


We can also write using similar syntax (use an extra "a" to append):
We write using similar syntax. Here we open a new file for writing, write out the list of lines, and then close the new file.


<source lang="python">
a_new_file = open("new_file.txt", "w")
a_new_file = open("new_file.txt", "w")
a_new_file.writelines(lines)
a_new_file.writelines(lines)
a_new_file.close()
a_new_file.close()
</source>


The file now exists. Pull in the native os module and use it to list the contents of the file:
<source lang="python">
import os
import os
os.system('cat new_file.txt')
os.system('cat new_file.txt')
</source>


note that you need to convert to strings vefore writing.
(''Note that readlines and writelines want ASCII strings from you. You need to convert floats and integers to strings before writing.'')


Pickling
==Pickling==


It's possible to directly save and load variables from a file (without making them into strings and worrying about parsing).
It's possible to directly save and load variables from a file without making them into strings or worrying about parsing lines into discrete variables. To do this, we use python's built-in serializer '''pickle'''. First import the pickle module:


<source lang="python">
import pickle
import pickle
</source>


Make a dictionary
Now make a dictionary, which we will shortly save:


<source lang="python">
a_dict = {"field1":100,
a_dict = {"field1":100,
           50:[1,2,3,5],
           50:[1,2,3,5],
           3.14:"hello"}
           3.14:"hello"}
</source>


Save the dictionary
To save the dictionary using pickle you open a file for output, initialize a "pickler" pointing at the file, and dump the dictionary to the pickler.


f = open("pickle.jar","w")
<source lang="python">
f = open("pickle_jar.pkl","w")
p = pickle.Pickler(f)
p = pickle.Pickler(f)
p.dump(a_dict)
p.dump(a_dict)
f.close()
f.close()
</source>


Go ahead and have a look at what it's doing.
Go ahead and have a look at what it's doing by just listing the file.


<source lang="python">
import os
import os
os.system("cat pickle.jar")
os.system("cat pickle_jar.pkl")
</source>


ascii but not english.
the file is ASCII, but it's not English.


Get the stuff back
To get back the things that you just pickled, we reverse the previous sequence. We use an Unpickler in place of the original Pickler and open the file for reading instead of writing. The result looks like this:


f = open("pickle.jar","r")
<source lang="python">
f = open("pickle_jar.pkl","r")
u = pickle.Unpickler(f)
u = pickle.Unpickler(f)
read_back = u.load()
read_back = u.load()
f.close()
f.close()
</source>


So '''read_back''' is a variable containing the output of the unpickler. Have a look, it should be identical to the input dictionary:
<source lang="python">
print a_dict
print a_dict
print read_back
print read_back
</source>


There's also a more compact syntax to just load and dump directly from a file. Options allow binary instead of ascii writing. And there's a faster version called cPickle.
(There's also a more compact syntax to just load and dump directly from a file. Options allow binary instead of ASCII writing. And there's a faster version called cPickle.)


Pickle is stack-based by the way, so:
What if we wanted to store many variables? Pickle gives you back the variables in the order that you put them in, so that if you have three simple variables:


<source lang="python">
a = 1
a = 1
b = 2
b = 2
c = 3
c = 3
</source>


Save the dictionary
and save them using pickle:


f = open("another_pickle.jar","w")
<source lang="python">
f = open("another_pickle_jar.pkl","w")
p = pickle.Pickler(f)
p = pickle.Pickler(f)
p.dump(a)
p.dump(a)
Line 97: Line 140:
p.dump(c)
p.dump(c)
f.close()
f.close()
</source>


Get the stuff back
then you can pull them back out like so:


f = open("another_pickle.jar","r")
<source lang="python">
f = open("another_pickle_jar.pkl","r")
u = pickle.Unpickler(f)
u = pickle.Unpickler(f)
var1 = u.load()
var1 = u.load()
var2 = u.load()
var2 = u.load()
var3 = u.load()
var3 = u.load()
print a, b, c
print var1, var2, var3
</source>


... a variable too far:
But you can only pull out as many variables as you put in. If we go a variable too far:


<source lang="python">
var4 = u.load()
var4 = u.load()
</source>


uhoh!
Then we run into trouble.


<source lang="python">
f.close()
f.close()
</source>


print var1, var2, var3
Of course the disadvantage of pickling is that you need to unpickle it. This is not a generic format to save data and share with other people.
 
== FITS Access via CASA ==


Of course the disadvantage of pickling is that you need to unpickle it. This is not a generic format to save data and share with other people.
== UV and Meta-data Access via CASA ==


Other approaches:
== Other Approaches ==
   
   
You don't need to waste a lot of effort duplicating previous work on reading and writing text files. Adam Ginsburg's "readcol.py" (loosely patterned after the IDL version, linked from the page) will save you a lot of effort. The package astroasciidata also looks promising but I have not yet gotten a chance to experiment with it.
You don't need to waste a lot of effort duplicating previous work on reading and writing text files. Adam Ginsburg's "readcol.py" (loosely patterned after the IDL version, linked from the page) will save you a lot of effort. The package astroasciidata also looks promising but I have not yet gotten a chance to experiment with it.

Latest revision as of 15:21, 2 November 2011

Back to the PythonOverview.

Preface

In addition to manipulating your data, you need some way to save and access it. Here we look at saving and loading files. We'll have a look at how to save and load ASCII data from disk, then how to quickly save and load more complex collections using pickle, and access to astronomical FITS data (or at least CASA images) via CASA. We'll begin by looking at how to get input from the user.

Input

Input can be accepted from the command line (or a script paused) using the raw_input command. Use this to query the user or to put a pause inside a script. "raw_input" returns the user input as its output:

verb = raw_input("Give me a verb: ")
noun = raw_input("Give me a noun: ")

and you can then use these as you would any other string variable:

mad_lib = "More fun than "+verb+"ing a "+noun
print mad_lib

Then inserting a pause in a script is as easy as:

dummy = raw_input("Hit <Enter> to continue.")

ASCII Files in Basic Python

Python provides easy basic file access. Grab our File:Example file.txt for the following example.

Python lets you open a file with the open command, like so:

a_file = open("example_file.txt", "r")

The second parameter determines how you will access the file. r means read, w means write, and a means append. You can both read and write at once if you want. Read up here for more.

Now that it's open we can read the lines in the file into a list like so:

lines = a_file.readlines()
print lines

We could also have read a single line with readline() or only a fixed set of bytes with read().

After we've written or read our data, we will want to close the file. Do this with the .close() method like so.

a_file.close()

We write using similar syntax. Here we open a new file for writing, write out the list of lines, and then close the new file.

a_new_file = open("new_file.txt", "w")
a_new_file.writelines(lines)
a_new_file.close()

The file now exists. Pull in the native os module and use it to list the contents of the file:

import os
os.system('cat new_file.txt')

(Note that readlines and writelines want ASCII strings from you. You need to convert floats and integers to strings before writing.)

Pickling

It's possible to directly save and load variables from a file without making them into strings or worrying about parsing lines into discrete variables. To do this, we use python's built-in serializer pickle. First import the pickle module:

import pickle

Now make a dictionary, which we will shortly save:

a_dict = {"field1":100,
          50:[1,2,3,5],
          3.14:"hello"}

To save the dictionary using pickle you open a file for output, initialize a "pickler" pointing at the file, and dump the dictionary to the pickler.

f = open("pickle_jar.pkl","w")
p = pickle.Pickler(f)
p.dump(a_dict)
f.close()

Go ahead and have a look at what it's doing by just listing the file.

import os
os.system("cat pickle_jar.pkl")

the file is ASCII, but it's not English.

To get back the things that you just pickled, we reverse the previous sequence. We use an Unpickler in place of the original Pickler and open the file for reading instead of writing. The result looks like this:

f = open("pickle_jar.pkl","r")
u = pickle.Unpickler(f)
read_back = u.load()
f.close()

So read_back is a variable containing the output of the unpickler. Have a look, it should be identical to the input dictionary:

print a_dict
print read_back

(There's also a more compact syntax to just load and dump directly from a file. Options allow binary instead of ASCII writing. And there's a faster version called cPickle.)

What if we wanted to store many variables? Pickle gives you back the variables in the order that you put them in, so that if you have three simple variables:

a = 1
b = 2
c = 3

and save them using pickle:

f = open("another_pickle_jar.pkl","w")
p = pickle.Pickler(f)
p.dump(a)
p.dump(b)
p.dump(c)
f.close()

then you can pull them back out like so:

f = open("another_pickle_jar.pkl","r")
u = pickle.Unpickler(f)
var1 = u.load()
var2 = u.load()
var3 = u.load()
print a, b, c
print var1, var2, var3

But you can only pull out as many variables as you put in. If we go a variable too far:

var4 = u.load()

Then we run into trouble.

f.close()

Of course the disadvantage of pickling is that you need to unpickle it. This is not a generic format to save data and share with other people.

FITS Access via CASA

UV and Meta-data Access via CASA

Other Approaches

You don't need to waste a lot of effort duplicating previous work on reading and writing text files. Adam Ginsburg's "readcol.py" (loosely patterned after the IDL version, linked from the page) will save you a lot of effort. The package astroasciidata also looks promising but I have not yet gotten a chance to experiment with it.