Loading Data: Difference between revisions

Revision as of 15:07, 19 April 2012

Overview

This tutorial details the process of obtaining EVLA data from the archive in measurement set (MS) format. (A similar process for importing EVLA data into AIPS can be found here.) In addition, the implications of data averaging are described so that one may make an informed decision about whether -- and how -- to perform frequency or time-averaging to reduce the size of the dataset, as well as the initial flags that are generally applied to the data .

Obtaining data from the archive

EVLA data are available from the NRAO Science Data Archive. Details and updates regarding the archive can be found on the EVLA/VLA/VLBA Data Archive web page. Here, we choose to download a publicly-available observation. If you are downloading proprietary data you will need to either sign into your NRAO account (using the link at the top of the archive page) or obtain the Project Access Key from the NRAO data analysts.

We want to find data associated with the project "TVER0002", so enter this into the "Project Code" under the "General Search Parameters" and submit the query. This will find two archive files; we will be downloading the first of these, so click on the checkbox next to the file name "TVER0002_sb2557689_1.55517.018916574074".

Selecting a set of scans

Although this isn't necessary, we will also choose to select only a subset of the scans in this observation. You may wish to do this with your data if you only want to retrieve data for a specific source or receiver band. In order to figure out which scans you want, you can either click on the "Scans" link in the "View Scans" column, or (if it's a long or particularly complex observation) you can download the SDM tables which contain metadata about the observation, but are not very large, and use the task listsdm to inspect the data. For demonstration purposes, we will choose this method.

To do this,

Fill in your email address and the preferred download location;
For the data download format, choose "SDM tables only (no visibiliites)";
Click on "Get My Data."

Ignore the file size. In this case, it's listed as 21.76 GB, which is the entire dataset; the SDM tables are in fact only 314 MB.

Once the download is complete, start CASA and run listsdm:

# In CASA
myScans = listsdm('TVER0002.sb2568947.eb2579996.55518.22356400463')

This gives output very similar to listobs, but also provides a Python dictionary containing useful information. In this example, we've stored the dictionary as "myScans." The top-level keys are simply the scan numbers, and each scan includes the following keys:

# In CASA
myScans[1].keys()

['field',
 'nchan',
 'end',
 'chanwidth',
 'source',
 'timerange',
 'start',
 'reffreq',
 'spws',
 'intent',
 'baseband',
 'nsubs']

Let's say we wish to download all scans which have a field ID of 1, which we know to be 3C48 but not to include dummy scans (the first two, which have a field ID of 0). We can have Python create a string of scan numbers that satisfy this requirement, which we can then feed to the Archive Query page:

# In CASA
# Define an empty list to fill with selected scan numbers
scanList = ''
# Loop over the scans, adding to scanList when the field
# ID is equal to 1:
for key in myScans:
    if (myScans[key]['field'] == 1):
        scanList = scanList + str(key) + ','

# Print out the scan list for pasting into the Archive Query 
# page, removing the unnecessary trailing comma:
scanList.rstrip(',')
  Out[46]: '3,4,5,6,11,12,13,14,19,20,21,22'

With this new information, return to the Archive Query page (you can use the "back" arrow on your web browser), and enter this list into the box entitled "Select scans for MS or AIPS FITS."

Choosing to average data

It is often advantageous to average data, in frequency and / or time, in order to reduce its size and speed up processing. However, one must be cautious not to over-average the data, since this will result in an unacceptable amount of bandwidth smearing (for frequency-averaging) or amplitude loss (for time-averaging).

The Observational Status Summary (OSS) contains a more detailed description of these phenomena, as does the "white book," Synthesis Imaging in Radio Astronomy II, Chapters 2 and 17.

As can be seen in the "Telescope:config" column on the Archive Query page, our observation was performed in the C-configuration. Therefore, assuming we are happy to tolerate a 1% amplitude loss for a source at the first null of the primary beam, we could average the data by around 20 seconds (see Table 9 in the OSS). Since the most the Archive Query page allows us to average is 10 s, choose this value on the pull-down menu "Choose online averaging for CASA MS or AIPS FITS."

In order to determine how much frequency averaging we will want, let's say we're willing to accept a 5% reduction in peak response (see Table 8 in the OSS). Let's say we want this at the first null of the primary beam, which, for C-configuration, is around 70 x the synthesized beamwidth. Therefore,

[math]\displaystyle{ \Delta{\nu} / \nu_{0} = 0.5 \times 0.95 / 70 = 0.0068 }[/math]

For [math]\displaystyle{ \nu_{0} \approx 5 GHz }[/math], [math]\displaystyle{ \Delta{\nu} = 34 MHz }[/math].

The channel width in this observation was 0.5 MHz, with 128 MHz bandwidth per spectral window. Therefore, while it would be possible to average to only 4 channels per SPW, this would be detrimental for subsequent calibration, since we will have lost a lot of information in the bandpass structure and may smear RFI across channels. We will therefore be conservative and average by two channels, again choosing this via the pull-down menu.

By averaging in time by a factor of 10, and frequency by a factor of 2, we will reduce the size of the dataset by a factor of 20 to around 1 GB. Furthermore, we are only requesting around 3/5 of the data (via the scan selection), so the total size will be around 700 MB.

Retrieval process

, entering an email address, and selecting "Create tar file". Note that this last step makes data retrieval substantially easier, since the data will comprise multiple files within a directory if no tar bundle is requested

When the archive process is complete, an email notification is sent out with information about the download directory. Copy the data to a convenient location, and unpack the tar file by typing "tar xvf TVER0002_sb2557689_1.55517.018916574074.tar". This will create the SDM data directory, but will retain the original tar file -- to conserve disk space, you will probably wish to delete the tar file.

Starting CASA and initial inspection: listobs

Note that a description of importing EVLA data into AIPS can be found here.

First, be sure you have the most recent version of CASA installed. To start CASA, type "casapy"; this will start writing output to a log file called "casapy.log" as well as to the logger window, and will store any command-line input in a file called "ipython.log". (Note that a detailed description of the CASA environment, including relevant information on the Python language, can be found here.)

Initial data flagging: online flags, zero flags, and shadow flags

Loading Data: Difference between revisions

Revision as of 15:07, 19 April 2012

Contents

Overview

Obtaining data from the archive

Selecting a set of scans

Choosing to average data

Retrieval process

Starting CASA and initial inspection: listobs

Initial data flagging: online flags, zero flags, and shadow flags

Navigation menu

Page actions

Page actions

Personal tools

Search

Tools

@@ Line 86: / Line 86: @@
 </math>
-For <math>\nu_{0} \approx 5 GHz</math>,
+For <math>\nu_{0} \approx 5 GHz</math>, <math>\Delta{\nu} = 34 MHz</math>.
-<math>
+The channel width in this observation was 0.5 MHz, with 128 MHz bandwidth per spectral window.  Therefore, while it would be possible to average to only 4 channels per SPW, this would be detrimental for subsequent calibration, since we will have lost a lot of information in the bandpass structure and may smear RFI across channels.  We will therefore be conservative and average by two channels, again choosing this via the pull-down menu.
-\Delta{\nu} = 34 MHz
-</math>
-The channel width in this observation was 0.5 MHz, with 128 MHz bandwidth per spectral window.  Therefore, while it would be possible to average to only 4 channels per SPW, this would be detrimental for subsequent calibration, since we will have lost a lot of information in the bandpass structure and may smear RFI across channels.  We will therefore be conservative and average by two channels, again choosing this via the pull-down menu.
+By averaging in time by a factor of 10, and frequency by a factor of 2, we will reduce the size of the dataset by a factor of 20 to around 1 GB.  Furthermore, we are only requesting around 3/5 of the data (via the scan selection), so the total size will be around 700 MB.
 === Retrieval process ===