Measurement Set Contents

From CASA Guides
Jump to navigationJump to search


Overview

In order to fully understand your data and the way CASA operates on it, it's helpful to have a full picture of the way the data are stored, and what a "measurement set" really is. This CASA Guide describes the measurement set (MS) structure, and demonstrates some ways in which you can explore the information stored within an MS. This is particularly useful when exploring the lower-level CASA Toolkit functions, but is also good information to keep in mind when performing basic analysis. For a complete description of the Measurement Set specifications, please see the MeasurementSet definition version 2.0.

Throughout this Guide, we will be using the same data as the Carbon Star IRC+10216: high frequency (36GHz), spectral line data reduction. You can use this as well if you would like identical results to what is presented here.

The post-split averaged data can be downloaded from http://casa.nrao.edu/Data/EVLA/IRC10216/day2_TDEM0003_10s_norx.tar.gz (data size: 1.1GB)

Once the download is complete, unzip and unpack the file:

# in a terminal, outside of CASA:
tar -xzvf day2_TDEM0003_10s_norx.tar.gz

The measurement set directory structure and contents

A measurement set is actually a directory; the data and metadata are stored in tables and subdirectories within this directory. To see these components, open a terminal window, go into the MS directory, and type ls:

cd <directory_path>/day2_TDEM0003_10s_norx
ls
ANTENNA          POLARIZATION     table.f10        table.f17_TSM1   table.f21        table.f26        table.f9
DATA_DESCRIPTION PROCESSOR        table.f11        table.f18        table.f22        table.f26_TSM1   table.info
FEED             SOURCE           table.f12        table.f18_TSM1   table.f23        table.f3         table.lock
FIELD            SPECTRAL_WINDOW  table.f13        table.f19        table.f23_TSM0   table.f4
FLAG_CMD         STATE            table.f14        table.f19_TSM1   table.f24        table.f5
HISTORY          WEATHER          table.f15        table.f2         table.f24_TSM1   table.f6
OBSERVATION      table.dat        table.f16        table.f20        table.f25        table.f7
POINTING         table.f1         table.f17        table.f20_TSM1   table.f25_TSM1   table.f8

Note that the listings in all-caps are also directories, in which more table.* files live.

The table.* files in the root MS directory are part of the "MAIN" table, and hold the data, along with identifying characteristics. The subdirectories are additional tables which contain metadata, referenced to columns within the MAIN table.

Inspecting MS contents in CASA

While it's possible to get some sense of the layout of an MS just from the command line, it's much more interesting to look at the contents using CASA tasks. One such task is browsetable, which can be run from the command line using casabrowser or within CASA as browsetable. This task allows one to investigate the information contained within a table, and if desired, the ability to edit this information.

Let's have a look at the contents of the MAIN table, using the command-line version. Since we're already in the MS directory, use a . to indicate we wish to open the current working directory. If you are in a different directory, give the full path to the MS directory:

casabrowser .
main table / browsetable

Now we can inspect the information contained in the MAIN table. The first column is labeled UVW, and contains the associated (u,v,w) values for each row of data. If you hover your mouse over a column header, a pop-up box with additional information will appear. In the case of the UVW column, this includes the data type (a Double Array), as well as the units (in this case, all are in meters).

Much of the information in an MS is contained in the subtables. For example, the MAIN table has an ANTENNA1 column which is an integer: you may already be familiar with this number as the antenna ID. However, the ID is not unique across MSs -- what is antenna "4" in one MS could be "6" in a different one. The information which links the antenna ID with its physical attributes is in the ANTENNA table. Let's have a look at this as well:

casabrowser ANTENNA

The row number in the ANTENNA table corresponds to the antenna ID: for example, antenna 6 is actually ea08, and at the time of this observation, lived on pad N01.

Other fields in the MAIN table may not be so obvious. For example, what is the STATE_ID referring to? We can see that there is also a STATE table, so open that using the File -> Open Table menu in the Table Browser. Again, the STATE_ID refers to rows in the STATE table; looking at the data contained within, we can see that the most interesting column is OBS_MODE, which contains information about what the data were acquired for. If you've set up observations with the Observation Preparation Tool, these should look familiar.

In this particular observation, data with a STATE_ID of 3 have an OBS_MODE which lists "CALIBRATE_PHASE.UNSPECIFIED,CALIBRATE_BANDPASS.UNSPECIFIED,UNSPECIFIED.UNSPECIFIED". Ignoring the extraneous "UNSPECIFIEDs", this information tells us that the observer intended these data to be used for bandpass calibration, as well as phase determination.

While many of the relationships between the MAIN table and the subtables can be determined with a little sleuthing, a lot of useful information can also be found in the MeasurementSet definition version 2.0 document.

Advanced: using information about MS structure

For most CASA tasks, it's sufficient to get information about the MS from listobs. However, if you wish to delve into the CASA Toolkit, or write a CASA task, you will likely need to put some of this knowledge to use.

For example, let's say you're working on a pipeline. One possibility is that you could, for each dataset, look at the information from listobs (e.g.) and then input appropriate parameters to your script. Wouldn't it be nicer if the script itself could simply look at the data and figure this information out on its own?

Here's a snippet of code that will determine the spectral window range present in an MS, as well as the number of channels in each SPW and the associated reference frequencies. Note that this doesn't explicitly check that there are data present for these SPWs; if you've split the data on an axis other than SPW which has also limited the range of SPWs present, this could cause problems down the line (since split carries along extraneous table information about SPWs in this case).

# It's a good habit to create a new instance of the tool first, to 
# avoid possible collisions:
myTB = tbtool.create()
# You need to define msName='myMS.ms' first -- then, this will 
# open the table with spectral window information
myTB.open(msName + '/SPECTRAL_WINDOW')
# Read the REF_FREQUENCY and NUM_CHAN columns into Python arrays
refFreq = tb.getcol('REF_FREQUENCY')
nChan = tb.getcol('NUM_CHAN')
# Close the table
myTB.close()
# The SPW index is simply the row number in SPECTRAL_WINDOW
spwObs = range(0, len(refFreq))

Now that you have the Python arrays spwObs, refFreq, and nChan, these can be used to make decisions about how to process the data (e.g., if there's something that is frequency-specific), to build up useful strings (say you want to create strings of the form spw:chanStart~chanEnd), etc. The toolkit, and table structure, can be very powerful when combined to perform sophisticated, hands-off processing tasks.