Loading Data
Overview
This tutorial details the process of obtaining EVLA data from the archive in measurement set (MS) format. (A similar process for importing EVLA data into AIPS can be found here.) In addition, the implications of data averaging are described so that one may make an informed decision about whether -- and how -- to perform frequency or time-averaging to reduce the size of the dataset, as well as the initial flags that are generally applied to the data .
Obtaining data from the archive
EVLA data are available from the NRAO Science Data Archive. Details and updates regarding the archive can be found on the EVLA/VLA/VLBA Data Archive web page. Here, we choose to download a publicly-available observation. If you are downloading proprietary data you will need to either sign into your NRAO account (using the link at the top of the archive page) or obtain the Project Access Key from the NRAO data analysts.
We want to find data associated with the project "TVER0002", so enter this into the "Project Code" under the "General Search Parameters" and submit the query. This will find two archive files; we will be downloading the second of these, so click on the checkbox next to the file name "TVER0002.sb2568947.eb2579996.55518.22356400463".
Selecting a set of scans
Although this isn't necessary, we will also choose to select only a subset of the scans in this observation. You may wish to do this with your data if you only want to retrieve data for a specific source or receiver band. In order to figure out which scans you want, you can either click on the "Scans" link in the "View Scans" column, or (if it's a long or particularly complex observation) you can download the SDM tables which contain metadata about the observation, but are not very large, and use the task listsdm to inspect the data. For demonstration purposes, we will choose this method.
To do this,
- Fill in your email address and the preferred download location;
- For the data download format, choose "SDM tables only (no visibiliites)";
- Click on "Get My Data."
Ignore the file size. In this case, it's listed as 21.76 GB, which is the entire dataset; the SDM tables are in fact only 314 MB.
Once the download is complete, start CASA and run listsdm:
# In CASA
myScans = listsdm('TVER0002.sb2568947.eb2579996.55518.22356400463')
This gives output very similar to listobs, but also provides a Python dictionary containing useful information. In this example, we've stored the dictionary as "myScans." The top-level keys are simply the scan numbers, and each scan includes the following keys:
# In CASA
myScans[1].keys()
['field', 'nchan', 'end', 'chanwidth', 'source', 'timerange', 'start', 'reffreq', 'spws', 'intent', 'baseband', 'nsubs']
Let's say we wish to download all scans which have a field ID of 1, which we know to be 3C48 but not to include dummy scans (the first two, which have a field ID of 0). We can have Python create a string of scan numbers that satisfy this requirement, which we can then feed to the Archive Query page:
# In CASA
# Define an empty list to fill with selected scan numbers
scanList = ''
# Loop over the scans, adding to scanList when the field
# ID is equal to 1:
for key in myScans:
if (myScans[key]['field'] == 1):
scanList = scanList + str(key) + ','
# Print out the scan list for pasting into the Archive Query
# page, removing the unnecessary trailing comma:
scanList.rstrip(',')
Out[46]: '3,4,5,6,11,12,13,14,19,20,21,22'
With this new information, return to the Archive Query page (you can use the "back" arrow on your web browser), and enter this list into the box entitled "Select scans for MS or AIPS FITS."
Choosing to average data
It is often advantageous to average data, in frequency and / or time, in order to reduce its size and speed up processing. However, one must be cautious not to over-average the data, since this will result in an unacceptable amount of bandwidth smearing (for frequency-averaging) or amplitude loss (for time-averaging).
The Observational Status Summary (OSS) contains a more detailed description of these phenomena, as does the "white book," Synthesis Imaging in Radio Astronomy II, Chapters 2 and 17.
As can be seen in the "Telescope:config" column on the Archive Query page, our observation was performed in the C-configuration. Therefore, assuming we are happy to tolerate a 1% amplitude loss for a source at the first null of the primary beam, we could average the data by around 20 seconds (see Table 9 in the OSS). Since the most the Archive Query page allows us to average is 10 s, choose this value on the pull-down menu "Choose online averaging for CASA MS or AIPS FITS."
In order to determine how much frequency averaging we will want, let's say we're willing to accept a 5% reduction in peak response (see Table 8 in the OSS). Let's say we want this at the first null of the primary beam, which, for C-configuration, is around 70 x the synthesized beamwidth. Therefore,
[math]\displaystyle{ \Delta{\nu} / \nu_{0} = 0.5 \times 0.95 / 70 = 0.0068 }[/math]
For [math]\displaystyle{ \nu_{0} \approx }[/math] 5 GHz, [math]\displaystyle{ \Delta{\nu} = }[/math] 34 MHz.
The channel width in this observation was 0.5 MHz, with 128 MHz bandwidth per spectral window. Therefore, while it would be possible to average to only 4 channels per SPW, this would be detrimental for subsequent calibration, since we will have lost a lot of information in the bandpass structure and may smear RFI across channels. We will therefore be conservative and average by two channels, again choosing this via the pull-down menu.
By averaging in time by a factor of 10, and frequency by a factor of 2, we will reduce the size of the dataset by a factor of 20 to around 1 GB. Furthermore, we are only requesting around 3/5 of the data (via the scan selection), so the total size will be around 700 MB.
Retrieval process
Now that we have selected scans, as well as requested data averaging, be sure to click on "CASA MS" as the download format and "Create MS or SDM tar file". Note that this last step makes data retrieval substantially easier, since the data will comprise multiple files within a directory if no tar bundle is requested. Be sure you've entered the correct email address, and click "Get My Data."
Again, it will say that the file size is 24.3712 GB; we can ignore this knowing that with the selected scans and averaging it will actually be far less. Click on "Retrieve over Internet," and wait for the email letting you know it's available.
When the archive process is complete, an email notification is sent out with information about the download directory. Copy the data to a convenient location, and unpack the tar file by typing:
tar xvf TVER0002.sb2568947.eb2579996.55518.22356400463.ms.tar
To conserve disk space, you will probably wish to delete the tar file at this point.
Note that this has likely created a directory structure starting with lustre/aoc/ftp/e2earchive/; this is a bit inconvenient, so move the measurement set to the current working directory instead:
mv lustre/aoc/ftp/e2earchive/TVER0002.sb2568947.eb2579996.55518.22356400463.ms .
Starting CASA and initial inspection: listobs
First, be sure you have the most recent version of CASA installed. To start CASA, type "casapy"; this will start writing output to a log file called "casapy.log" as well as to the logger window, and will store any command-line input in a file called "ipython.log". (Note that a detailed description of the CASA environment, including relevant information on the Python language, can be found here.)
The best place to start with a new MS is to run listobs:
# In CASA
listobs('TVER0002.sb2568947.eb2579996.55518.22356400463.ms')
================================================================================ MeasurementSet Name: /Science/CASA_Guides/TVER0002.sb2568947.eb2579996.55518.22356400463.ms MS Version 2 ================================================================================ Observer: Dr. Miriam I. Krauss Project: T.B.D. Observation: EVLA Data records: 487376 Total integration time = 1613.05 seconds Observed from 18-Nov-2010/05:24:49.2 to 18-Nov-2010/05:51:42.3 (UTC) ObservationID = 0 ArrayID = 0 Date Timerange (UTC) Scan FldId FieldName nRows Int(s) SpwIds ScanIntent 18-Nov-2010/05:24:49.2 - 05:25:16.7 3 1 3C48 13600 7.93 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:25:26.5 - 05:26:46.5 4 1 3C48 50544 9.99 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:26:56.3 - 05:28:16.2 5 1 3C48 50544 9.98 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:28:26.0 - 05:29:45.5 6 1 3C48 50544 9.89 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:36:05.3 - 05:37:14.8 11 1 3C48 44928 9.76 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:37:24.6 - 05:38:44.6 12 1 3C48 50544 10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:38:54.3 - 05:40:14.3 13 1 3C48 50544 10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:40:24.0 - 05:41:43.6 14 1 3C48 46800 9.89 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:47:57.3 - 05:48:42.8 19 1 3C48 28240 9.88 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:48:52.6 - 05:49:42.6 20 1 3C48 33696 10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:49:52.5 - 05:50:42.5 21 1 3C48 33696 10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED 05:50:52.3 - 05:51:42.3 22 1 3C48 33696 10 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED (nRows = Total number of rows per scan) Fields: 1 ID Code Name RA Decl Epoch SrcId nRows 1 Q 3C48 01:37:41.29943 +33.09.35.1330 J2000 1 487376 Spectral Windows: (16 unique spectral windows and 2 unique polarization setups) SpwID #Chans Frame Ch1(MHz) ChanWid(kHz) TotBW(kHz) Corrs 0 128 TOPO 4488.25 1000 128000 RR 1 128 TOPO 4616.25 1000 128000 RR 2 128 TOPO 4744.25 1000 128000 RR 3 128 TOPO 4872.25 1000 128000 RR 4 128 TOPO 5000.25 1000 128000 RR 5 128 TOPO 5128.25 1000 128000 RR 6 128 TOPO 5256.25 1000 128000 RR 7 128 TOPO 5384.25 1000 128000 RR 8 128 TOPO 4488.25 1000 128000 LL 9 128 TOPO 4616.25 1000 128000 LL 10 128 TOPO 4744.25 1000 128000 LL 11 128 TOPO 4872.25 1000 128000 LL 12 128 TOPO 5000.25 1000 128000 LL 13 128 TOPO 5128.25 1000 128000 LL 14 128 TOPO 5256.25 1000 128000 LL 15 128 TOPO 5384.25 1000 128000 LL
Note that the only scans present are those we requested, and only field ID 1 (3C48) is included. Also, note that the channel widths are now 1 MHz instead of 0.5 MHz, since we asked for frequency averaging over two channels, and the integration times are 10 seconds. Sometimes they are a bit less; this is due to data that were deleted by online, zero-valued data, or shadow flags -- see the following section for more information.