Measurement Set Contents: Difference between revisions
(6 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
== Overview == | == Overview == | ||
In order to fully understand your data and the way CASA operates on it, it's helpful to have a full picture of the way the data are stored, and what a "measurement set" really is. This CASA Guide describes the measurement set (MS) structure, and demonstrates some ways in which you can explore the information stored within an MS. This is particularly useful when exploring the lower-level [http://casa.nrao.edu/docs/CasaRef/CasaRef.html CASA Toolkit] functions, but is also good information to keep in mind when performing basic analysis. | In order to fully understand your data and the way CASA operates on it, it's helpful to have a full picture of the way the data are stored, and what a "measurement set" really is. This CASA Guide describes the measurement set (MS) structure, and demonstrates some ways in which you can explore the information stored within an MS. This is particularly useful when exploring the lower-level [http://casa.nrao.edu/docs/CasaRef/CasaRef.html CASA Toolkit] functions, but is also good information to keep in mind when performing basic analysis. For a complete description of the Measurement Set specifications, please see the [http://casa.nrao.edu/Memos/229.html MeasurementSet definition version 2.0]. | ||
Throughout this Guide, we will be using the same data as the [http://casaguides.nrao.edu/index.php?title=EVLA_high_frequency_Spectral_Line_tutorial_-_IRC%2B10216_part1 Carbon Star IRC+10216: high frequency (36GHz), spectral line data reduction]. You can use this as well if you would like identical results to what is presented here. | Throughout this Guide, we will be using the same data as the [http://casaguides.nrao.edu/index.php?title=EVLA_high_frequency_Spectral_Line_tutorial_-_IRC%2B10216_part1 Carbon Star IRC+10216: high frequency (36GHz), spectral line data reduction]. You can use this as well if you would like identical results to what is presented here. | ||
Line 66: | Line 66: | ||
In this particular observation, data with a STATE_ID of 3 have an OBS_MODE which lists "CALIBRATE_PHASE.UNSPECIFIED,CALIBRATE_BANDPASS.UNSPECIFIED,UNSPECIFIED.UNSPECIFIED". Ignoring the extraneous "UNSPECIFIEDs", this information tells us that the observer intended these data to be used for bandpass calibration, as well as phase determination. | In this particular observation, data with a STATE_ID of 3 have an OBS_MODE which lists "CALIBRATE_PHASE.UNSPECIFIED,CALIBRATE_BANDPASS.UNSPECIFIED,UNSPECIFIED.UNSPECIFIED". Ignoring the extraneous "UNSPECIFIEDs", this information tells us that the observer intended these data to be used for bandpass calibration, as well as phase determination. | ||
While many of the relationships between the MAIN table and the subtables can be determined with a little sleuthing, a lot of useful information can also be found in the [http://casa.nrao.edu/Memos/229.html MeasurementSet definition version 2.0] document. | |||
== Advanced information about MS structure == | == Advanced: using information about MS structure == | ||
For most CASA tasks, it's sufficient to get information about the MS from {{listobs}}. However, if you wish to delve into the [http://casa.nrao.edu/docs/CasaRef/CasaRef.html CASA Toolkit], or [http://casaguides.nrao.edu/index.php?title=Writing_a_CASA_Task write a CASA task], you will likely need to put some of this knowledge to use. | |||
For example, let's say you're working on a pipeline. One possibility is that you could, for each dataset, look at the information from listobs (e.g.) and then input appropriate parameters to your script. Wouldn't it be nicer if the script itself could simply look at the data and figure this information out on its own? | |||
Here's a snippet of code that will determine the spectral window range present in an MS, as well as the number of channels in each SPW and the associated reference frequencies. Note that this doesn't explicitly check that there are <i>data</i> present for these SPWs; if you've {{split}} the data on an axis other than SPW which has also limited the range of SPWs present, this could cause problems down the line (since {{split}} carries along extraneous table information about SPWs in this case). | |||
<source lang="python"> | |||
# It's a good habit to create a new instance of the tool first, to | |||
# avoid possible collisions: | |||
myTB = tbtool.create() | |||
# You need to define msName='myMS.ms' first -- then, this will | |||
# open the table with spectral window information | |||
myTB.open(msName + '/SPECTRAL_WINDOW') | |||
# Read the REF_FREQUENCY and NUM_CHAN columns into Python arrays | |||
refFreq = tb.getcol('REF_FREQUENCY') | |||
nChan = tb.getcol('NUM_CHAN') | |||
# Close the table | |||
myTB.close() | |||
# The SPW index is simply the row number in SPECTRAL_WINDOW | |||
spwObs = range(0, len(refFreq)) | |||
</source> | |||
Now that you have the Python arrays spwObs, refFreq, and nChan, these can be used to make decisions about how to process the data (e.g., if there's something that is frequency-specific), to build up useful strings (say you want to create strings of the form spw:chanStart~chanEnd), etc. The toolkit, and table structure, can be very powerful when combined to perform sophisticated, hands-off processing tasks. |
Latest revision as of 19:06, 9 August 2012
Overview
In order to fully understand your data and the way CASA operates on it, it's helpful to have a full picture of the way the data are stored, and what a "measurement set" really is. This CASA Guide describes the measurement set (MS) structure, and demonstrates some ways in which you can explore the information stored within an MS. This is particularly useful when exploring the lower-level CASA Toolkit functions, but is also good information to keep in mind when performing basic analysis. For a complete description of the Measurement Set specifications, please see the MeasurementSet definition version 2.0.
Throughout this Guide, we will be using the same data as the Carbon Star IRC+10216: high frequency (36GHz), spectral line data reduction. You can use this as well if you would like identical results to what is presented here.
The post-split averaged data can be downloaded from http://casa.nrao.edu/Data/EVLA/IRC10216/day2_TDEM0003_10s_norx.tar.gz (data size: 1.1GB)
Once the download is complete, unzip and unpack the file:
# in a terminal, outside of CASA:
tar -xzvf day2_TDEM0003_10s_norx.tar.gz
The measurement set directory structure and contents
A measurement set is actually a directory; the data and metadata are stored in tables and subdirectories within this directory. To see these components, open a terminal window, go into the MS directory, and type ls:
cd <directory_path>/day2_TDEM0003_10s_norx
ls
ANTENNA POLARIZATION table.f10 table.f17_TSM1 table.f21 table.f26 table.f9 DATA_DESCRIPTION PROCESSOR table.f11 table.f18 table.f22 table.f26_TSM1 table.info FEED SOURCE table.f12 table.f18_TSM1 table.f23 table.f3 table.lock FIELD SPECTRAL_WINDOW table.f13 table.f19 table.f23_TSM0 table.f4 FLAG_CMD STATE table.f14 table.f19_TSM1 table.f24 table.f5 HISTORY WEATHER table.f15 table.f2 table.f24_TSM1 table.f6 OBSERVATION table.dat table.f16 table.f20 table.f25 table.f7 POINTING table.f1 table.f17 table.f20_TSM1 table.f25_TSM1 table.f8
Note that the listings in all-caps are also directories, in which more table.* files live.
The table.* files in the root MS directory are part of the "MAIN" table, and hold the data, along with identifying characteristics. The subdirectories are additional tables which contain metadata, referenced to columns within the MAIN table.
Inspecting MS contents in CASA
While it's possible to get some sense of the layout of an MS just from the command line, it's much more interesting to look at the contents using CASA tasks. One such task is browsetable, which can be run from the command line using casabrowser or within CASA as browsetable. This task allows one to investigate the information contained within a table, and if desired, the ability to edit this information.
Let's have a look at the contents of the MAIN table, using the command-line version. Since we're already in the MS directory, use a . to indicate we wish to open the current working directory. If you are in a different directory, give the full path to the MS directory:
casabrowser .
Now we can inspect the information contained in the MAIN table. The first column is labeled UVW, and contains the associated (u,v,w) values for each row of data. If you hover your mouse over a column header, a pop-up box with additional information will appear. In the case of the UVW column, this includes the data type (a Double Array), as well as the units (in this case, all are in meters).
Much of the information in an MS is contained in the subtables. For example, the MAIN table has an ANTENNA1 column which is an integer: you may already be familiar with this number as the antenna ID. However, the ID is not unique across MSs -- what is antenna "4" in one MS could be "6" in a different one. The information which links the antenna ID with its physical attributes is in the ANTENNA table. Let's have a look at this as well:
casabrowser ANTENNA
The row number in the ANTENNA table corresponds to the antenna ID: for example, antenna 6 is actually ea08, and at the time of this observation, lived on pad N01.
Other fields in the MAIN table may not be so obvious. For example, what is the STATE_ID referring to? We can see that there is also a STATE table, so open that using the File -> Open Table menu in the Table Browser. Again, the STATE_ID refers to rows in the STATE table; looking at the data contained within, we can see that the most interesting column is OBS_MODE, which contains information about what the data were acquired for. If you've set up observations with the Observation Preparation Tool, these should look familiar.
In this particular observation, data with a STATE_ID of 3 have an OBS_MODE which lists "CALIBRATE_PHASE.UNSPECIFIED,CALIBRATE_BANDPASS.UNSPECIFIED,UNSPECIFIED.UNSPECIFIED". Ignoring the extraneous "UNSPECIFIEDs", this information tells us that the observer intended these data to be used for bandpass calibration, as well as phase determination.
While many of the relationships between the MAIN table and the subtables can be determined with a little sleuthing, a lot of useful information can also be found in the MeasurementSet definition version 2.0 document.
Advanced: using information about MS structure
For most CASA tasks, it's sufficient to get information about the MS from listobs. However, if you wish to delve into the CASA Toolkit, or write a CASA task, you will likely need to put some of this knowledge to use.
For example, let's say you're working on a pipeline. One possibility is that you could, for each dataset, look at the information from listobs (e.g.) and then input appropriate parameters to your script. Wouldn't it be nicer if the script itself could simply look at the data and figure this information out on its own?
Here's a snippet of code that will determine the spectral window range present in an MS, as well as the number of channels in each SPW and the associated reference frequencies. Note that this doesn't explicitly check that there are data present for these SPWs; if you've split the data on an axis other than SPW which has also limited the range of SPWs present, this could cause problems down the line (since split carries along extraneous table information about SPWs in this case).
# It's a good habit to create a new instance of the tool first, to
# avoid possible collisions:
myTB = tbtool.create()
# You need to define msName='myMS.ms' first -- then, this will
# open the table with spectral window information
myTB.open(msName + '/SPECTRAL_WINDOW')
# Read the REF_FREQUENCY and NUM_CHAN columns into Python arrays
refFreq = tb.getcol('REF_FREQUENCY')
nChan = tb.getcol('NUM_CHAN')
# Close the table
myTB.close()
# The SPW index is simply the row number in SPECTRAL_WINDOW
spwObs = range(0, len(refFreq))
Now that you have the Python arrays spwObs, refFreq, and nChan, these can be used to make decisions about how to process the data (e.g., if there's something that is frequency-specific), to build up useful strings (say you want to create strings of the form spw:chanStart~chanEnd), etc. The toolkit, and table structure, can be very powerful when combined to perform sophisticated, hands-off processing tasks.