Difference between revisions of "Loading Data"

From CASA Guides
Jump to: navigation, search
 
Line 15: Line 15:
 
Once the archive query is complete, the results page will allow you to "Choose download data format."  Options include CASA MS, AIPS FITS, SDM-BDF dataset (all files), or SDM tables only (no visibiliites).  "SDM" is the ALMA Science Data Model, the native binary-format output of the telescope, whereas CASA requires the data to be in CASA MS (Measurement Set) format.  For more information on the SDM, see information [https://safe.nrao.edu/wiki/bin/view/EVLA/ScienceDataModelResourcesAndDocumentation#Science_Data_Model_Resources_and_Documentation on this page]; Measurement Set specifications can be cound in the [http://casa.nrao.edu/Memos/229.html MeasurementSet definition version 2.0].
 
Once the archive query is complete, the results page will allow you to "Choose download data format."  Options include CASA MS, AIPS FITS, SDM-BDF dataset (all files), or SDM tables only (no visibiliites).  "SDM" is the ALMA Science Data Model, the native binary-format output of the telescope, whereas CASA requires the data to be in CASA MS (Measurement Set) format.  For more information on the SDM, see information [https://safe.nrao.edu/wiki/bin/view/EVLA/ScienceDataModelResourcesAndDocumentation#Science_Data_Model_Resources_and_Documentation on this page]; Measurement Set specifications can be cound in the [http://casa.nrao.edu/Memos/229.html MeasurementSet definition version 2.0].
  
This Guide assumes you will request an MS, in which case the archive servers will convert the data from SDM to MS format prior to delivery.  If you choose to download the SDM dataset, you will need to run {{importevla}} to create an MS prior to processing the data in CASA.  However, this may be advantageous if you wish to import only portions of the data at a time using scan selection -- if you have the SDM locally, you can simply run several iterations of {{importevla}} rather than initiating multiple download queries.
+
This Guide assumes you will request an MS, in which case the archive servers will convert the data from SDM to MS format prior to delivery.  If you choose to download the SDM dataset, you will need to run {{importasdm}} to create an MS prior to processing the data in CASA.  However, this may be advantageous if you wish to import only portions of the data at a time using scan selection -- if you have the SDM locally, you can simply run several iterations of {{importasdm}} rather than initiating multiple download queries.
  
 
=== Selecting a set of scans ===
 
=== Selecting a set of scans ===

Latest revision as of 09:53, 19 September 2019


Overview

This tutorial details the process of obtaining EVLA data from the archive in measurement set (MS) format. (A similar process for importing EVLA data into AIPS can be found here.) In addition, the implications of data averaging are described so that one may make an informed decision about whether -- and how -- to perform frequency or time-averaging to reduce the size of the dataset, as well as the initial flags that are generally applied to the data .

Obtaining data from the archive

EVLA data are available from the NRAO Science Data Archive. Details and updates regarding the archive can be found on the EVLA/VLA/VLBA Data Archive web page. Here, we choose to download a publicly-available observation. If you are downloading proprietary data you will need to either sign into your NRAO account (using the link at the top of the archive page) or obtain the Project Access Key from the NRAO data analysts.

We want to find data associated with the project "TVER0002", so enter this into the "Project Code" under the "General Search Parameters" and submit the query. This will find two archive files; we will be downloading the second of these, so click on the checkbox next to the file name "TVER0002.sb2568947.eb2579996.55518.22356400463".

Choosing to download in MS or SDM format

Once the archive query is complete, the results page will allow you to "Choose download data format." Options include CASA MS, AIPS FITS, SDM-BDF dataset (all files), or SDM tables only (no visibiliites). "SDM" is the ALMA Science Data Model, the native binary-format output of the telescope, whereas CASA requires the data to be in CASA MS (Measurement Set) format. For more information on the SDM, see information on this page; Measurement Set specifications can be cound in the MeasurementSet definition version 2.0.

This Guide assumes you will request an MS, in which case the archive servers will convert the data from SDM to MS format prior to delivery. If you choose to download the SDM dataset, you will need to run importasdm to create an MS prior to processing the data in CASA. However, this may be advantageous if you wish to import only portions of the data at a time using scan selection -- if you have the SDM locally, you can simply run several iterations of importasdm rather than initiating multiple download queries.

Selecting a set of scans

Although this isn't necessary, we will also choose to select only a subset of the scans in this observation. You may wish to do this with your data if you only want to retrieve data for a specific source or receiver band. In order to figure out which scans you want, you can either click on the "Scans" link in the "View Scans" column, or (if it's a long or particularly complex observation) you can download the SDM tables which contain metadata about the observation, but are not very large, and use the task listsdm to inspect the data. For demonstration purposes, we will choose this method.

To do this,

  • Fill in your email address and the preferred download location;
  • For the data download format, choose "SDM tables only (no visibiliites)";
  • Click on "Get My Data."

Ignore the file size. In this case, it's listed as 21.76 GB, which is the entire dataset; the SDM tables are in fact only 314 MB.

Once the download is complete, start CASA and run listsdm:

# In CASA
myScans = listsdm('TVER0002.sb2568947.eb2579996.55518.22356400463')

This gives output very similar to listobs, but also provides a Python dictionary containing useful information. In this example, we've stored the dictionary as "myScans." The top-level keys are simply the scan numbers, and each scan includes the following keys:

# In CASA
myScans[1].keys()
['field',
 'nchan',
 'end',
 'chanwidth',
 'source',
 'timerange',
 'start',
 'reffreq',
 'spws',
 'intent',
 'baseband',
 'nsubs']

Let's say we wish to download all scans which have a field ID of 1, which we know to be 3C48 but not to include dummy scans (the first two, which have a field ID of 0). We can have Python create a string of scan numbers that satisfy this requirement, which we can then feed to the Archive Query page:

# In CASA
# Define an empty list to fill with selected scan numbers
scanList = ''
# Loop over the scans, adding to scanList when the field
# ID is equal to 1:
for key in myScans:
    if (myScans[key]['field'] == 1):
        scanList = scanList + str(key) + ','

# Print out the scan list for pasting into the Archive Query 
# page, removing the unnecessary trailing comma:
scanList.rstrip(',')
  Out[46]: '3,4,5,6,11,12,13,14,19,20,21,22'

With this new information, return to the Archive Query page (you can use the "back" arrow on your web browser), and enter this list into the box entitled "Select scans for MS or AIPS FITS."

Choosing to average data

It is often advantageous to average data, in frequency and / or time, in order to reduce its size and speed up processing. However, one must be cautious not to over-average the data, since this will result in an unacceptable amount of bandwidth smearing (for frequency-averaging) or amplitude loss (for time-averaging).

The Observational Status Summary (OSS) contains a more detailed description of these phenomena, as does the "white book," Synthesis Imaging in Radio Astronomy II, Chapters 2 and 17.

As can be seen in the "Telescope:config" column on the Archive Query page, our observation was performed in the C-configuration. Therefore, assuming we are happy to tolerate a 1% amplitude loss for a source at the first null of the primary beam, we could average the data by around 20 seconds (see Table 9 in the OSS). Since the most the Archive Query page allows us to average is 10 s, choose this value on the pull-down menu "Choose online averaging for CASA MS or AIPS FITS."

In order to determine how much frequency averaging we will want, let's say we're willing to accept a 5% reduction in peak response. From Table 8 in the OSS) we find that for such a reduction the following relation applies: 
\Delta{\nu} / \nu_{0} \times \theta_{0}/\theta_{HPBW}= 0.5

For the VLA C-configuration at 5GHz, the synthesized beam \theta_{HPBW} is approximately 4.2" (uniform weighting, see [OSS). Assuming a source is at the first null of the primary beam, then its angular distance from the phase center at 5GHz (\theta_{0}) is about 9', which happens to be about the same as the Full Width Half Maximum of the primary beam. That corresponds to approximately 130 times \theta_{HPBW} or \theta_{0}/\theta_{HPBW}=130.

Therefore \Delta{\nu} / \nu_{0} = 0.5/130 = 0.00385 which is 19MHz for \nu_{0}=5GHz

The channel width in this observation was 0.5 MHz, with 128 MHz bandwidth per spectral window. Therefore, while it would be possible to average to only 7 channels per SPW, this would be detrimental for subsequent calibration, since we will have lost a lot of information in the bandpass structure and may smear RFI across channels. We will therefore be conservative and average by two channels, again choosing this via the pull-down menu. After calibrating, but before imaging, it would be possible to average over more channels.

By averaging in time by a factor of 10, and frequency by a factor of 2, we will reduce the size of the dataset by a factor of 20 to around 1 GB. Furthermore, we are only requesting around 3/5 of the data (via the scan selection), so the total size will be around 700 MB.

Retrieval process

Now that we have selected scans, as well as requested data averaging, be sure to click on "CASA MS" as the download format and "Create MS or SDM tar file". Note that requesting a tar file makes data retrieval substantially easier, since the data will comprise multiple files within a directory unless it is requested.

Note: If you are working locally at NRAO and have space on the Lustre filesystem, you can download directly to your Lustre area by entering the full directory path in the "Enter download destination" box. Be sure that this directory is world-writable first, by executing "chmod 777 ." in the directory itself. This will save you the time of copying the data over from the archive space. Also, in this case, there is no need to create a tar file.

Be sure you've entered the correct email address, then click "Get My Data."

Again, it will say that the file size is 24.3712 GB; we can ignore this knowing that with the selected scans and averaging it will actually be far less. Click on "Retrieve over Internet," and wait for the email letting you know it's available.

When the archive process is complete, an email notification is sent out with information about the download directory. Copy the data to a convenient location, and unpack the tar file by typing:

tar xvf TVER0002.sb2568947.eb2579996.55518.22356400463.ms.tar

To conserve disk space, you will probably wish to delete the tar file at this point.

Note that this has likely created a directory structure starting with lustre/aoc/ftp/e2earchive/; this is a bit inconvenient, so move the measurement set and associated flag tables to the current working directory instead:

mv lustre/aoc/ftp/e2earchive/TVER0002.sb2568947.eb2579996.55518.22356400463.ms .
mv lustre/aoc/ftp/e2earchive/TVER0002.sb2568947.eb2579996.55518.22356400463.ms.flagversions .

Starting CASA and initial inspection: listobs

First, be sure you have the most recent version of CASA installed. To start CASA, type "casapy"; this will start writing output to a log file called "casapy.log" as well as to the logger window, and will store any command-line input in a file called "ipython.log". (Note that a detailed description of the CASA environment, including relevant information on the Python language, can be found here.)

The best place to start with a new MS is to run listobs:

# In CASA
listobs('TVER0002.sb2568947.eb2579996.55518.22356400463.ms')
================================================================================
           MeasurementSet Name:  /Science/CASA_Guides/TVER0002.sb2568947.eb2579996.55518.22356400463.ms      MS Version 2
================================================================================
   Observer: Dr. Miriam I. Krauss     Project: T.B.D.  
Observation: EVLA
Data records: 487376       Total integration time = 1613.05 seconds
   Observed from   18-Nov-2010/05:24:49.2   to   18-Nov-2010/05:51:42.3 (UTC)

   ObservationID = 0         ArrayID = 0
  Date        Timerange (UTC)          Scan  FldId FieldName           nRows   Int(s)   SpwIds      ScanIntent
  18-Nov-2010/05:24:49.2 - 05:25:16.7     3      1 3C48                13600  7.93     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:25:26.5 - 05:26:46.5     4      1 3C48                50544  9.99     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:26:56.3 - 05:28:16.2     5      1 3C48                50544  9.98     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:28:26.0 - 05:29:45.5     6      1 3C48                50544  9.89     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:36:05.3 - 05:37:14.8    11      1 3C48                44928  9.76     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:37:24.6 - 05:38:44.6    12      1 3C48                50544  10       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:38:54.3 - 05:40:14.3    13      1 3C48                50544  10       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:40:24.0 - 05:41:43.6    14      1 3C48                46800  9.89     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:47:57.3 - 05:48:42.8    19      1 3C48                28240  9.88     [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:48:52.6 - 05:49:42.6    20      1 3C48                33696  10       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:49:52.5 - 05:50:42.5    21      1 3C48                33696  10       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
              05:50:52.3 - 05:51:42.3    22      1 3C48                33696  10       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]CALIBRATE_AMPLI#UNSPECIFIED,CALIBRATE_PHASE#UNSPECIFIED,CALIBRATE_BANDPASS#UNSPECIFIED,OBSERVE_TARGET#UNSPECIFIED
           (nRows = Total number of rows per scan) 
Fields: 1
  ID   Code Name                RA              Decl          Epoch   SrcId nRows  
  1    Q    3C48                01:37:41.29943 +33.09.35.1330 J2000   1     487376 
Spectral Windows:  (16 unique spectral windows and 2 unique polarization setups)
  SpwID  #Chans Frame Ch1(MHz)    ChanWid(kHz)  TotBW(kHz)  Corrs
  0         128 TOPO  4488.25     1000          128000      RR  
  1         128 TOPO  4616.25     1000          128000      RR  
  2         128 TOPO  4744.25     1000          128000      RR  
  3         128 TOPO  4872.25     1000          128000      RR  
  4         128 TOPO  5000.25     1000          128000      RR  
  5         128 TOPO  5128.25     1000          128000      RR  
  6         128 TOPO  5256.25     1000          128000      RR  
  7         128 TOPO  5384.25     1000          128000      RR  
  8         128 TOPO  4488.25     1000          128000      LL  
  9         128 TOPO  4616.25     1000          128000      LL  
  10        128 TOPO  4744.25     1000          128000      LL  
  11        128 TOPO  4872.25     1000          128000      LL  
  12        128 TOPO  5000.25     1000          128000      LL  
  13        128 TOPO  5128.25     1000          128000      LL  
  14        128 TOPO  5256.25     1000          128000      LL  
  15        128 TOPO  5384.25     1000          128000      LL  
{output truncated}

Note that the only scans present are those we requested, and only field ID 1 (3C48) is included. Also, note that the channel widths are now 1 MHz instead of 0.5 MHz, since we asked for frequency averaging over two channels, and the integration times are 10 seconds. Sometimes they are a bit less; this is due to data that were deleted by online, zero-valued data, or shadow flags -- see the following section for more information.

Initial data flagging: online flags, zero flags, and shadow flags

Online, shadow, and zero flags

When we requested the data from the archive, we left the "Apply telescope flags" box checked (the default). This meant that data marked as bad by the "online" system (when an antenna is not on source, or if there was a subreflector or focus error), as well as pure zeros (generated if there is a correlator problem) and data from shadowed antennas were deleted.

It is useful to check what data were affected. To do this, use the flagcmd task to produce a plot:

# In CASA
flagcmd(vis='TVER0002.sb2568947.eb2579996.55518.22356400463.ms',action='plot')

From this, we can see that antenna ea08 had some subreflector issues, and lengths of time when an antenna was not "on source" (i.e., it was slewing from one object to the next) varied a bit according to antenna. This accounts for the fact that the integration times provided by listobs are not exactly 10 s: the first integration of a given scan begins when the first antenna arrives on-source; since this does not happen simultaneously for all antennas, and the given integration time is an average across antennas, we can get values like 7.93 s (as for Scan 3).

Although the shadow and clip flags are plotted as well, the exact times affected by these flags are not known, so do not be alarmed by the fact that they appear to span the entire plot -- this is rarely actually the case.

Unfortunately, since we requested time-averaged data from the archive, the flagged data are not included in the MS. If you are concerned about the possible erroneous deletion of good data, it's best to download the complete MS, inspect it, and then perform time and / or frequency averaging.