Loading Data: Difference between revisions

From CASA Guides
Jump to navigationJump to search
Line 76: Line 76:
It is often advantageous to average data, in frequency and / or time, in order to reduce its size and speed up processing.  However, one must be cautious not to over-average the data, since this will result in an unacceptable amount of bandwidth smearing (for frequency-averaging) or amplitude loss (for time-averaging).   
It is often advantageous to average data, in frequency and / or time, in order to reduce its size and speed up processing.  However, one must be cautious not to over-average the data, since this will result in an unacceptable amount of bandwidth smearing (for frequency-averaging) or amplitude loss (for time-averaging).   


The [http://evlaguides.nrao.edu/index.php?title=Observational_Status_Summary Observational Status Summary] contains a more detailed description of these phenomena, as does the "white book," [http://adsabs.harvard.edu/abs/1999ASPC..180.....T Synthesis Imaging in Radio Astronomy II], Chapters 2 and 17.
The [http://evlaguides.nrao.edu/index.php?title=Observational_Status_Summary Observational Status Summary] (OSS) contains a more detailed description of these phenomena, as does the "white book," [http://adsabs.harvard.edu/abs/1999ASPC..180.....T Synthesis Imaging in Radio Astronomy II], Chapters 2 and 17.


As can be seen in the "Telescope:config" column on the Archive Query page, our observation was performed in the C-configuration.  Therefore, assuming we are happy to tolerate a 1% amplitude loss for a source at the first null of the primary beam, we can average the data by around 20 seconds (see [http://evlaguides.nrao.edu/index.php?title=Observational_Status_Summary#Time-Averaging_Loss Table 9] in the OSS).
As can be seen in the "Telescope:config" column on the Archive Query page, our observation was performed in the C-configuration.  Therefore, assuming we are happy to tolerate a 1% amplitude loss for a source at the first null of the primary beam, we can average the data by around 20 seconds (see [http://evlaguides.nrao.edu/index.php?title=Observational_Status_Summary#Time-Averaging_Loss Table 9] in the [http://evlaguides.nrao.edu/index.php?title=Observational_Status_Summary OSS]).
 
In order to determine how much frequency averaging we will want, let's say we're willing to accept a 5% reduction in peak response (see [http://evlaguides.nrao.edu/index.php?title=Observational_Status_Summary#Chromatic_Aberration_.28Bandwidth_Smearing.29 Table 8] in the [http://evlaguides.nrao.edu/index.php?title=Observational_Status_Summary OSS]).  Let's say we want this at the first null of the primary beam, which, for C-configuration, is around 70 x the synthesized beamwidth.  Therefore,
 
<math>
\delta{\nu} / \nu_{0} = 0.5 \times 0.95 \times 70 \approx 33
</math>


=== Retrieval process ===
=== Retrieval process ===

Revision as of 14:56, 19 April 2012


Overview

This tutorial details the process of obtaining EVLA data from the archive in measurement set (MS) format. (A similar process for importing EVLA data into AIPS can be found here.) In addition, the implications of data averaging are described so that one may make an informed decision about whether -- and how -- to perform frequency or time-averaging to reduce the size of the dataset, as well as the initial flags that are generally applied to the data .

Obtaining data from the archive

EVLA data are available from the NRAO Science Data Archive. Details and updates regarding the archive can be found on the EVLA/VLA/VLBA Data Archive web page. Here, we choose to download a publicly-available observation. If you are downloading proprietary data you will need to either sign into your NRAO account (using the link at the top of the archive page) or obtain the Project Access Key from the NRAO data analysts.

We want to find data associated with the project "TVER0002", so enter this into the "Project Code" under the "General Search Parameters" and submit the query. This will find two archive files; we will be downloading the first of these, so click on the checkbox next to the file name "TVER0002_sb2557689_1.55517.018916574074".

Selecting a set of scans

Although this isn't necessary, we will also choose to select only a subset of the scans in this observation. You may wish to do this with your data if you only want to retrieve data for a specific source or receiver band. In order to figure out which scans you want, you can either click on the "Scans" link in the "View Scans" column, or (if it's a long or particularly complex observation) you can download the SDM tables which contain metadata about the observation, but are not very large, and use the task listsdm to inspect the data. For demonstration purposes, we will choose this method.

To do this,

  • Fill in your email address and the preferred download location;
  • For the data download format, choose "SDM tables only (no visibiliites)";
  • Click on "Get My Data."

Ignore the file size. In this case, it's listed as 21.76 GB, which is the entire dataset; the SDM tables are in fact only 314 MB.

Once the download is complete, start CASA and run listsdm:

# In CASA
myScans = listsdm('TVER0002.sb2568947.eb2579996.55518.22356400463')

This gives output very similar to listobs, but also provides a Python dictionary containing useful information. In this example, we've stored the dictionary as "myScans." The top-level keys are simply the scan numbers, and each scan includes the following keys:

# In CASA
myScans[1].keys()
['field',
 'nchan',
 'end',
 'chanwidth',
 'source',
 'timerange',
 'start',
 'reffreq',
 'spws',
 'intent',
 'baseband',
 'nsubs']

Let's say we wish to download all scans which have a field ID of 1, which we know to be 3C48 but not to include dummy scans (the first two, which have a field ID of 0). We can have Python create a string of scan numbers that satisfy this requirement, which we can then feed to the Archive Query page:

# In CASA
# Define an empty list to fill with selected scan numbers
scanList = ''
# Loop over the scans, adding to scanList when the field
# ID is equal to 1:
for key in myScans:
    if (myScans[key]['field'] == 1):
        scanList = scanList + str(key) + ','

# Print out the scan list for pasting into the Archive Query 
# page, removing the unnecessary trailing comma:
scanList.rstrip(',')
  Out[46]: '3,4,5,6,11,12,13,14,19,20,21,22'

With this new information, return to the Archive Query page (you can use the "back" arrow on your web browser), and enter this list into the box entitled "Select scans for MS or AIPS FITS."

Choosing to average data

It is often advantageous to average data, in frequency and / or time, in order to reduce its size and speed up processing. However, one must be cautious not to over-average the data, since this will result in an unacceptable amount of bandwidth smearing (for frequency-averaging) or amplitude loss (for time-averaging).

The Observational Status Summary (OSS) contains a more detailed description of these phenomena, as does the "white book," Synthesis Imaging in Radio Astronomy II, Chapters 2 and 17.

As can be seen in the "Telescope:config" column on the Archive Query page, our observation was performed in the C-configuration. Therefore, assuming we are happy to tolerate a 1% amplitude loss for a source at the first null of the primary beam, we can average the data by around 20 seconds (see Table 9 in the OSS).

In order to determine how much frequency averaging we will want, let's say we're willing to accept a 5% reduction in peak response (see Table 8 in the OSS). Let's say we want this at the first null of the primary beam, which, for C-configuration, is around 70 x the synthesized beamwidth. Therefore,

[math]\displaystyle{ \delta{\nu} / \nu_{0} = 0.5 \times 0.95 \times 70 \approx 33 }[/math]

Retrieval process

, entering an email address, and selecting "Create tar file". Note that this last step makes data retrieval substantially easier, since the data will comprise multiple files within a directory if no tar bundle is requested

When the archive process is complete, an email notification is sent out with information about the download directory. Copy the data to a convenient location, and unpack the tar file by typing "tar xvf TVER0002_sb2557689_1.55517.018916574074.tar". This will create the SDM data directory, but will retain the original tar file -- to conserve disk space, you will probably wish to delete the tar file.

Starting CASA and initial inspection: listobs

Note that a description of importing EVLA data into AIPS can be found here.

First, be sure you have the most recent version of CASA installed. To start CASA, type "casapy"; this will start writing output to a log file called "casapy.log" as well as to the logger window, and will store any command-line input in a file called "ipython.log". (Note that a detailed description of the CASA environment, including relevant information on the Python language, can be found here.)

Initial data flagging: online flags, zero flags, and shadow flags