Guide To Processing ALMA Data for Cycle 0
Using the ALMA Data Archive for Cycle 0 Data
(rename the page with this title, when finished)
About this Guide, and Cycle 0 ALMA Data
This guide describes steps that you can use to process ALMA data, beginning with locating and downloading your data from the public archive, to making science-ready images.
We will use a sample data set from ALMA Cycle 0 in this guide. Data for Cycle 1 and beyond will be delivered in a different format and will require a separate guide.
For Cycle 0, ALMA data is delivered with a set of calibrated data files and a sample of reference images. The data were calibrated and imaged by an ALMA scientist at one of the ALMA Regional Centers (ARCs). The scientist developed CASA scripts that were used to complete the calibration and imaging. Cycle 0 data packages also include fully calibrated data sets, and the user can start with these data and proceed directly to imaging steps. In many cases, the imaging can be dramatically improved by including "self-calibration" steps. Self-calibration is the process of using the detected signal in the target source, itself, to tune the phase (and to a lesser extent, amplitude) calibrations, as a function of time.
The data package includes the calibration scripts used by the ARC scientist to perform the initial calibration and imaging steps. In most cases, users will not need to modify the calibration. But in some cases, some tuning of the calibration steps can improve the final images.
Typically, users interested in making science-ready images with Cycle 0 data from the ALMA archive will take the following steps:
- Download the data from the Archive
- Inspect the Quality Assessment plots and data files
- Inspect the reference images supplied with the data package
- Combine the calibrated data sets into a single calibrated measurement set
- Self-calibrate and image the combined data set
- Generate moment maps and other analysis products
Interested users may wish to review the calibration steps in detail, make modifications to the calibration script, and generate new calibrated data sets.
About the Sample Data: H2D+ in TW Hya
The data for this example comes from ALMA Project 2011.0.00340.S, "Searching for H2D+ in the disk of TW Hya v1.5", for which the PI is Chunhua Qi. Part of the data for this project has been published in Qi et al. 2013.
The observation was set up with two spectral windows ... frequency/bandwidth/chan spacing/etc
The project required three executions of the scheduling block. explain ...
What other data sets are available?
A Delivery List of publicly available data sets is provided on the ALMA Science Portal, in the "Data" tab.
Prerequisites : Computing Requirements
ALMA data sets can be very large and require significant computing resources for efficient processing. The data set used in this example begins with a download of 176 GB of data files. A description of recommended computing resources is given here. Those who do not have sufficient computing power may wish to arrange a visit to one of the ARCs to use the computing facilities at these sites. To arrange a visit to an ARC, submit a ticket to the ALMA Helpdesk.
Getting the Data: The ALMA Data Archive
The ALMA data archive is part of the ALMA Science Portal. A copy of the archive is stored at each of the ARCs, and you can connect to the nearest archive through these links:
The ALMA Archive Query page. |
Upon entry into the ALMA Archive Query page, set the "Results View" option to "project" (see the red highlight #1 in the figure) and specify the Project Code to 2011.0.00340.S (red highlight #2). Note, if you leave the "Results View" set to "raw data", you will see three rows of data sets in the results page. These correspond to three executions of the observing script. In fact, cor Cycle 0 data these rows contains copies of the same data set, so use care not to download the (large!) data set three times. By setting "Results View" to project, you see just one entry, and that is the one you'd like to download.
You can download the data through the Archive GUI. For more control over the download process, you can use the Unix shell script provided on the Request Handler page. This script has a name like "downloadRequest84998259script.sh". You need to put this file into a directory that contains ample disk space, and execute is in you shell. For example, in bash:
% chmod +x downloadRequest84998259script.sh % ./downloadRequest84998259script.sh
Unpacking the data
The data you have downloaded includes 17 tar files. Unpack these using the following command:
% for i in $(ls *.tar); do echo 'Untarring ' $i; tar xvf $i; done
At this point you will have a directory called "2011.0.00340.S" with the full data distribution.
Overview of Delivered Data and Products
To get to the top-level directory that contains the data package, do:
% cd 2011.0.00340.S/sg_ouss_id/group_ouss_id/member_ouss_2012-12-05_id
Here you will find the following entries:
% ls calibrated calibration log product qa raw README script
The README file describes the files in the distribution and includes notes from the ALMA scientist who performed the initial calibration and imaging.
The directories contain the following files:
-- raw/ |-- uid___A002_X554543_X207.ms.split/ |-- uid___A002_X554543_X3d0.ms.split/ |-- uid___A002_X554543_X667.ms.split/ -- calibrated/ |-- uid___A002_X554543_X207.ms.split.cal/ |-- uid___A002_X554543_X3d0.ms.split.cal/ |-- uid___A002_X554543_X667.ms.split.cal/ -- calibration/ |-- uid___A002_X554543_X207.calibration/ |-- uid___A002_X554543_X207.calibration.plots/ |-- uid___A002_X554543_X3d0.calibration/ |-- uid___A002_X554543_X3d0.calibration.plots/ |-- uid___A002_X554543_X667.calibration/ |-- uid___A002_X554543_X667.calibration.plots/ -- log/ |-- uid___A002_X554543_X207.calibration.log |-- uid___A002_X554543_X3d0.calibration.log |-- uid___A002_X554543_X667.calibration.log |-- Imaging.log |-- 340.log -- qa/ |-- uid___A002_X554543_X207__textfile.txt |-- uid___A002_X554543_X207__qa2_part1.png |-- uid___A002_X554543_X207__qa2_part2.png |-- uid___A002_X554543_X207__qa2_part3.png |-- uid___A002_X554543_X3d0__textfile.txt |-- uid___A002_X554543_X3d0__qa2_part1.png |-- uid___A002_X554543_X3d0__qa2_part2.png |-- uid___A002_X554543_X3d0__qa2_part3.png |-- uid___A002_X554543_X667__textfile.txt |-- uid___A002_X554543_X667__qa2_part1.png |-- uid___A002_X554543_X667__qa2_part2.png |-- uid___A002_X554543_X667__qa2_part3.png -- script/ |-- uid___A002_X554543_X207.ms.scriptForCalibration.py |-- uid___A002_X554543_X3d0.ms.scriptForCalibration.py |-- uid___A002_X554543_X667.ms.scriptForCalibration.py |-- scriptForImaging.py |-- import_data.py |-- scriptForFluxCalibration.py -- product/ |-- TWHya.continuum.fits |-- TWHya.N2H+.fits |-- TWHya.continuum.mask/ |-- TWHya.H2D+.mask/ |-- TWHya.N2H+.mask/
- calibrated: Calibrated data sets, ready to be combined and imaged.
- calibration: Auxiliary measurement sets generated in the calibration process.
- The "calibration.plots" directories contain (a few hundred) plots generated during the calibration process. These can be useful for the expert user to assess the quality of the calibration at each step.
- log: Describe here the data products and scripts. Pretty confused by these! See Scott Schnee.
- product: The final data products from the calibration and imaging process. The directory contains "reference" images that are used to determine the quality of the observation, but they are not necessarily science-ready. The data files here are useful for initial inspections.
- qa: The result of Quality Assessment tests. The data from each scheduling block goes through such a quality assessment, and all data delivered to the public ALMA Archive have passed the quality assessment. It is worthwhile to review the plots and text files contained here. You will find plots of the antenna configuration, UV coverage, calibration results, Tsys, and so on.
- raw: The "raw" data files. If you would like to tune or refine the calibration, these files would be your starting point. In fact, these data files do have certain a priori calibrations applied, amounting to steps 0-6 of the calibration scripts provided in the data package. These steps include a priori flagging and application of WVR, Tsys, and antenna position corrections.
- script: These are the scripts developed and applied by the ALMA scientist, to calibrate the data and generate reference images. These scripts cannot be applied directly to the raw data provided in this data distribution, but they can serve as a valuable reference to see the steps that need to be taken to reprocess the data, if you so choose.
What if I need to redo the Calibration?
In most cases, users should not need to re-do the calibration. However, you should consult this knowledgebase article
The data package includes a calibration script, for each execution of the scheduling block, that performs all the initial calibrations required prior to imaging the data set. The calibration script represents the exact step used to generate the packaged, calibrated data. However, for Cycle 0 data sets the script cannot be applied directly by the user. There are a couple of issues.
- The "raw" data supplied with the data package have had several a priori calibrations applied. These include calibration steps 0-6. So the user should begin with "step 7" in the packaged data reduction script.
- The data processing script was generated using CASA 3.4, but the users are recommended to use CASA 4.2 for all processing. There are a number of differences between these versions. The CASA 3.4 script should be used, therefore, as a guide to which steps need to be done, and in what order. But the script must be modified to work under CASA 4.2.
Refining the Calibration
uid___A002_X554543_X207.ms.scriptForCalibration.py uid___A002_X554543_X3d0.ms.scriptForCalibration.py uid___A002_X554543_X667.ms.scriptForCalibration.py
Flux Calibration
Note, the scripts refers to the combined data set. Flux cal script does the cal, then combines the 3 cal executions:
concat(vis = ['uid___A002_X554543_X207.ms.split.cal', 'uid___A002_X554543_X3d0.ms.split.cal', 'uid___A002_X554543_X667.ms.split.cal'],concatvis = 'calibrated.ms')
The imaging script will then work with the calibrated.ms