ALMA Imaging Pipeline Reprocessing Tool: Difference between revisions

From CASA Guides
Jump to navigationJump to search
Rloomis (talk | contribs)
Created page with "== Cycle Compatibility and New Tool == '''In Cycle 9, a new nomenclature was adopted for measurement sets within the ALMA pipeline: uid*targets.ms for the continuum + line (no..."
 
Swood (talk | contribs)
 
(24 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Cycle Compatibility and New Tool ==
== Cycle Compatibility ==
'''In Cycle 9, a new nomenclature was adopted for measurement sets within the ALMA pipeline: uid*targets.ms for the continuum + line (non-continuum-subtracted) target-only data, and uid*targets_line.ms to reference the continuum subtracted data. Data restored with a scriptForPI.py from prior to Cycle 9 will have an incompatible uid*target.ms format, and must be modified to uid*targets.ms to work with the scripts in this guide.'''
In Cycle 9, a new nomenclature was adopted for measurement sets within the ALMA pipeline: uid*targets.ms for the continuum + line (non-continuum-subtracted) target-only data, and uid*targets_line.ms to reference the continuum subtracted data. Data restored with a scriptForPI.py from prior to Cycle 9 will have an incompatible uid*target.ms format, and must be modified to uid*targets.ms to work with the scripts provided here.


Additionally, to improve the ease of imaging pipeline reprocessing, a new tool has been developed to streamline the methods detailed below. The documentation for this tool can be found here: .
In Cycle 10, the CASA task 'uvcontsub' was modified to no longer use uvcont tables, and instead only takes a cont.dat file, so the '--contsub_fast' option has been removed from this tool. The new CASA task is much faster.


== About This Guide ==
== Preparing Data ==


'''This guide describes some examples for perfecting the interferometric imaging products from the ALMA Cycle 9 Pipeline.''' If your data were manually imaged by ALMA, you should instead consult the scriptForImaging.py delivered with your data.  
The North American ARC began providing restored calibrated data as an added value product in a calibrated_final/ directory structure. In value added deliveries from previous cycles, the continuum + line data was split and concatenated into the delivered calibrated_final.ms, but this  measurement set could no longer be used with pipeline tasks, and the spectral window numbering was lost, which made it difficult to compare against the pipeline weblog.


The Section [[#Restore Pipeline Calibration and Prepare for Re-imaging (all Options)|Restore Pipeline Calibration and Prepare for Re-imaging]] describes the first steps to do. After that, the individual sections are self-contained (and they typically assume the "Restore" has been performed). It illustrates how to completely re-run the pipeline from beginning to end in order to reproduce the pipeline run done at your ARC.
In the new delivery structure, detailed below, all uid*targets.ms files are held within calibrated_final/measurement_sets, allowing the use of pipeline tasks and making for easier comparison with the delivered ALMA calibration + imaging pipeline weblog. An equivalent calibrated_final.ms can be created via 'concat' in CASA if this is desired.


Additional documentation on the Cycle 9 pipeline can be found in the [https://almascience.nrao.edu/documents-and-tools/cycle9/alma_pipeline_users_guide_2022_1 Pipeline User's Guide] which can also be found at [https://almascience.nrao.edu/processing/science-pipeline the ALMA Science Portal]. The User's guide describes how to obtain the ALMA Pipeline, how to use it to calibrate and image ALMA interferometric (IF) and single-dish (SD) data, and a description of the
    calibrated_final/                         # downloaded as calibrated_final.tgz
Pipeline WebLog.  
        caltables/                           # holds relevant calibration information for continuum subtraction
            - cont.dat                        # contains the identified continuum ranges from 'findcont' in the ALMA pipeline
        measurement_sets/                     # holds all measurement sets
            - uid*targets.ms                  # the non-continuum-subtracted measurement sets, per execution block
        - scriptForReprocessing.py            # the tool described on this page


Note that the scripts described in this guide have only been tested in Linux.
 
For data downloaded from the ALMA Science Archive, it must first be restored using scriptForPI.py and then placed into a compatible directory structure to work with the scriptForReprocessing.py imaging tool. The script [https://github.com/ryanaloomis/ALMA_image_reprocessing/blob/main/reprocessing_prep.py reprocessing_prep.py] below should be run to do this.
 
<source>
# run this script within the working/ directory to create a calibrated_final/ directory, mirroring the NA added value delivery structure.
# once calibrated_final/ is created, place scriptForReprocessing.py in calibrated_final/ and follow the scriptForReprocessing.py instructions
 
import glob
import os
import sys
 
# Check if calibrated_final/ already exists:
if glob.glob("calibrated_final"):
    print("ERROR: calibrated_final/ already exists; will not overwrite")
    sys.exit()
else:
    os.mkdir("calibrated_final")
 
# Fill the caltables
os.mkdir("calibrated_final/caltables")
os.system("cp -rf cont.dat calibrated_final/caltables")
 
# Fill the measurement_sets
os.mkdir("calibrated_final/measurement_sets")
# First try just uid*targets.ms
os.system("cp -rf uid*targets.ms calibrated_final/measurement_sets/")
# Then try uid*targets_line.ms
os.system("cp -rf uid*targets_line.ms calibrated_final/measurement_sets/")
 
print("Generated calibrated_final/ and filled caltables/ and measurement_sets/. Please place scriptForReprocessing.py in calibrated_final/ and follow README instructions.")
</source>
 
== About the Imaging Pipeline Reprocessing Tool - scriptForReprocessing.py ==
[https://github.com/ryanaloomis/ALMA_image_reprocessing/blob/main/scriptForReprocessing.py scriptForReprocessing.py] is intended to be a convenient wrapper for many of the ALMA imaging pipeline functions (including continuum subtraction) that users may wish to use on their NA delivered value-added products. See the [http://almascience.org/processing/science-pipeline ALMA Pipeline Users Guide and Reference Manual] for a full description of the ALMA pipeline.
 
The script can be launched via CASA with any version of CASA that includes the ALMA pipeline from Cycle 11 or later. See the above link for a mapping of ALMA Cycle, CASA version, and Pipeline version. Thus it should be launched as:
 
<pre style="background-color: #fffacd;">
$ casa --pipeline -c scriptForReprocessing.py [options]
</pre>
 
optional arguments:
  -h, --help            show this help message and exit
  --contsub            Fit and subtract continuum using the channel ranges from the local
                        cont.dat file. Generates new uid*targets_line.ms in measurement_sets/
  --image [IMAGE]      Run the imaging pipeline and place images in the specified directory
                        (default='images'). NOTE: unless cont.dat or the imaging options in this
                        script are modified, the images produced will be identical to those on the
                        ALMA Science Archive
  --cleanup            Remove working_reprocess/ directory and log files after any other options
                        are executed. WARNING: removes weblogs inside of working_reprocess/
  --weblog [WEBLOG]    Launches a browser to view weblog after other tasks are run. By default
                        ('latest'), displays the latest weblog generated locally. Other options
                        are to use the specific pipeline folder name (e.g.
                        'pipeline-20221010T192458')
  --calibrated_final    Concatenate uid*targets.ms to produce calibrated_final.ms in
                        measurement_sets/
  --calibrated_final_line
                        Concatenate uid*targets_line.ms (if they exist) to produce
                        calibrated_final_line.ms in measurement_sets/
 
== Suggested Workflows ==
A number of workflows are supported with the data organized in the directory structure detailed above:
 
- You can proceed with your scientific analysis starting with the uid*targets.ms files and supply them to CASA tasks such as tclean, uvcontsub, or gaincal as a list (vis=['MS1.ms', 'MS2.ms', etc]). Examining the casa commands for each stage of the delivered ALMA calibration + imaging weblog will give examples of this (e.g. you can get the tclean command for any image that was made by clicking within the relevant hif_makeimages() stage).
 
- You can use scriptForReprocessing.py to restore the continuum subtracted data, re-image the data in the ALMA pipeline using new imaging parameters, or view the weblog (see below for usage). Here you can also easily modify cont.dat and rerun the continuum subtraction and/or imaging with a different continuum selection.
 
- You can generate the old style calibrated_final.ms either using scriptForReprocessing.py, or by hand via concat(). If you use scriptForReprocessing.py, there is also an option to generate an analogous calibrated_final_line.ms.
 
== Example Usage ==
'''Continuum subtract to get uid*targets_line.ms and cleanup:'''
<pre style="background-color: #fffacd;">
$ casa --pipeline -c scriptForReprocessing.py --contsub --cleanup
</pre>
 
'''Continuum subtract with a modified cont.dat, reimage, and view the resultant weblog:'''
<pre style="background-color: #fffacd;">
# Modify cont.dat in caltables/
$ casa --pipeline -c scriptForReprocessing.py --contsub --image modified_images_folder --weblog
</pre>
 
'''Make new mfs and agg cont images with different robust value and view the resultant weblog:'''
<pre style="background-color: #fffacd;">
# Modify robust parameter in scriptForReprocessing.py
$ casa --pipeline -c scriptForReprocessing.py --image new_robust_images_folder --weblog
</pre>
 
'''Continuum subtract and generate all pipeline images with no mitigation:'''
<pre style="background-color: #fffacd;">
# Modify mitigate parameter in scriptForReprocessing.py -> False
$ casa --pipeline -c scriptForReprocessing.py --contsub --image no_mitigation_folder
</pre>
 
'''Make the old calibrated_final.ms and cleanup:'''
<pre style="background-color: #fffacd;">
$ casa --pipeline -c scriptForReprocessing.py --calibrated_final --cleanup
</pre>
 
== Editing scriptForReprocessing.py Imaging/Processing Parameters ==
If you open the script in a text editor, you will notice a block of user editable options at the top of the script (shown below). These options modify the imaging pipeline in various ways. Some of the most useful options may be to change mitigation parameters to image all of your science targets and spectral windows, to reimage a portion of your data with a different robust
value, or to reimage with a uvtaper applied.
 
make_mfs_images = True                  # generate mfs (per spw) images
make_cont_images = True                # generate aggregate continuum images
make_cube_images = True                # generate cube images
make_repBW_images = True                # generate images corresponding to the requested representative bandwidth
mitigate = True                        # run hif_checkproductsize() and mitigate created products if necessary.
                                        # Set to false if you want all spws and all targets imaged at full resolution.
                                        # WARNING: turning off mitigation may result in very large disk usage. Consider
                                        # adjusting other mitigation parameter first, or manually selecting the target/spw
                                        # combinations you want.
 
For all values below, see the [http://almascience.org/processing/science-pipeline ALMA Pipeline Users Guide and Reference Manual] for detailed descriptions.
 
maxproductsize = 350.                  # for mitigation; in GB
maxcubesize = 40.                      # for mitigation; in GB
maxcubelimit = 60.                      # for mitigation; in GB
field = None                            # String specifying fields to be imaged; default is all (pending mitigation)
                                        #    Example: ’3C279, M82’
phasecenter = None                      # Direction measure or field id of the image center. The default phase center is
                                        # set to the mean of the field directions of all fields that are to be image together.
                                        #    Examples: ’ICRS 13:05:27.2780 -049.28.04.458’, "TRACKFIELD" (for ephemeris)
spw = None                              # Spw(s) to image; default is all spws
                                        #    Example: '17, 23'
uvrange = None                          # Select a set of uv ranges to image; default is all
                                        #    Examples: ’0~1000klambda’, [’0~100klambda’,'300~1000klambda']
hm_imsize = None                        # Image x and y size in pixels or PB level; default is automatically determined
                                        #    Examples: ’0.3pb’, [120, 120]
hm_cell = None                          # Image cell size; default is automatically determined
                                        #    Examples: ’3ppb’, [’0.5arcsec’, ’0.5arcsec’]
nbins = None                            # Channel binning factor for each spw; default is none
                                        #    Format:’spw1:nb1,spw2:nb2,...’ with optional wildcards: ’*:nb’
                                        #    Examples: ’9:2,11:4,13:2,15:8’, ’*:2’
robust = None                          # Robust value to image with; default is automatically determined
                                        #    Example: 0.5
uvtaper = None                          # Uvtaper to apply to data; default is none
                                        #    Example: ['1arcsec']

Latest revision as of 19:48, 30 October 2024

Cycle Compatibility

In Cycle 9, a new nomenclature was adopted for measurement sets within the ALMA pipeline: uid*targets.ms for the continuum + line (non-continuum-subtracted) target-only data, and uid*targets_line.ms to reference the continuum subtracted data. Data restored with a scriptForPI.py from prior to Cycle 9 will have an incompatible uid*target.ms format, and must be modified to uid*targets.ms to work with the scripts provided here.

In Cycle 10, the CASA task 'uvcontsub' was modified to no longer use uvcont tables, and instead only takes a cont.dat file, so the '--contsub_fast' option has been removed from this tool. The new CASA task is much faster.

Preparing Data

The North American ARC began providing restored calibrated data as an added value product in a calibrated_final/ directory structure. In value added deliveries from previous cycles, the continuum + line data was split and concatenated into the delivered calibrated_final.ms, but this measurement set could no longer be used with pipeline tasks, and the spectral window numbering was lost, which made it difficult to compare against the pipeline weblog.

In the new delivery structure, detailed below, all uid*targets.ms files are held within calibrated_final/measurement_sets, allowing the use of pipeline tasks and making for easier comparison with the delivered ALMA calibration + imaging pipeline weblog. An equivalent calibrated_final.ms can be created via 'concat' in CASA if this is desired.

   calibrated_final/                         # downloaded as calibrated_final.tgz
       caltables/                            # holds relevant calibration information for continuum subtraction
           - cont.dat                        # contains the identified continuum ranges from 'findcont' in the ALMA pipeline
       measurement_sets/                     # holds all measurement sets
           - uid*targets.ms                  # the non-continuum-subtracted measurement sets, per execution block
       - scriptForReprocessing.py            # the tool described on this page


For data downloaded from the ALMA Science Archive, it must first be restored using scriptForPI.py and then placed into a compatible directory structure to work with the scriptForReprocessing.py imaging tool. The script reprocessing_prep.py below should be run to do this.

# run this script within the working/ directory to create a calibrated_final/ directory, mirroring the NA added value delivery structure.
# once calibrated_final/ is created, place scriptForReprocessing.py in calibrated_final/ and follow the scriptForReprocessing.py instructions

import glob
import os
import sys

# Check if calibrated_final/ already exists:
if glob.glob("calibrated_final"):
    print("ERROR: calibrated_final/ already exists; will not overwrite")
    sys.exit()
else:
    os.mkdir("calibrated_final")

# Fill the caltables
os.mkdir("calibrated_final/caltables")
os.system("cp -rf cont.dat calibrated_final/caltables")

# Fill the measurement_sets
os.mkdir("calibrated_final/measurement_sets")
# First try just uid*targets.ms
os.system("cp -rf uid*targets.ms calibrated_final/measurement_sets/")
# Then try uid*targets_line.ms
os.system("cp -rf uid*targets_line.ms calibrated_final/measurement_sets/")

print("Generated calibrated_final/ and filled caltables/ and measurement_sets/. Please place scriptForReprocessing.py in calibrated_final/ and follow README instructions.")

About the Imaging Pipeline Reprocessing Tool - scriptForReprocessing.py

scriptForReprocessing.py is intended to be a convenient wrapper for many of the ALMA imaging pipeline functions (including continuum subtraction) that users may wish to use on their NA delivered value-added products. See the ALMA Pipeline Users Guide and Reference Manual for a full description of the ALMA pipeline.

The script can be launched via CASA with any version of CASA that includes the ALMA pipeline from Cycle 11 or later. See the above link for a mapping of ALMA Cycle, CASA version, and Pipeline version. Thus it should be launched as:

$ casa --pipeline -c scriptForReprocessing.py [options]

optional arguments:

 -h, --help            show this help message and exit
 --contsub             Fit and subtract continuum using the channel ranges from the local
                       cont.dat file. Generates new uid*targets_line.ms in measurement_sets/
 --image [IMAGE]       Run the imaging pipeline and place images in the specified directory
                       (default='images'). NOTE: unless cont.dat or the imaging options in this
                       script are modified, the images produced will be identical to those on the
                       ALMA Science Archive
 --cleanup             Remove working_reprocess/ directory and log files after any other options
                       are executed. WARNING: removes weblogs inside of working_reprocess/
 --weblog [WEBLOG]     Launches a browser to view weblog after other tasks are run. By default
                       ('latest'), displays the latest weblog generated locally. Other options
                       are to use the specific pipeline folder name (e.g.
                       'pipeline-20221010T192458')
 --calibrated_final    Concatenate uid*targets.ms to produce calibrated_final.ms in
                       measurement_sets/
 --calibrated_final_line
                       Concatenate uid*targets_line.ms (if they exist) to produce
                       calibrated_final_line.ms in measurement_sets/

Suggested Workflows

A number of workflows are supported with the data organized in the directory structure detailed above:

- You can proceed with your scientific analysis starting with the uid*targets.ms files and supply them to CASA tasks such as tclean, uvcontsub, or gaincal as a list (vis=['MS1.ms', 'MS2.ms', etc]). Examining the casa commands for each stage of the delivered ALMA calibration + imaging weblog will give examples of this (e.g. you can get the tclean command for any image that was made by clicking within the relevant hif_makeimages() stage).

- You can use scriptForReprocessing.py to restore the continuum subtracted data, re-image the data in the ALMA pipeline using new imaging parameters, or view the weblog (see below for usage). Here you can also easily modify cont.dat and rerun the continuum subtraction and/or imaging with a different continuum selection.

- You can generate the old style calibrated_final.ms either using scriptForReprocessing.py, or by hand via concat(). If you use scriptForReprocessing.py, there is also an option to generate an analogous calibrated_final_line.ms.

Example Usage

Continuum subtract to get uid*targets_line.ms and cleanup:

$ casa --pipeline -c scriptForReprocessing.py --contsub --cleanup 

Continuum subtract with a modified cont.dat, reimage, and view the resultant weblog:

# Modify cont.dat in caltables/
$ casa --pipeline -c scriptForReprocessing.py --contsub --image modified_images_folder --weblog 

Make new mfs and agg cont images with different robust value and view the resultant weblog:

# Modify robust parameter in scriptForReprocessing.py 
$ casa --pipeline -c scriptForReprocessing.py --image new_robust_images_folder --weblog 

Continuum subtract and generate all pipeline images with no mitigation:

# Modify mitigate parameter in scriptForReprocessing.py -> False 
$ casa --pipeline -c scriptForReprocessing.py --contsub --image no_mitigation_folder

Make the old calibrated_final.ms and cleanup:

$ casa --pipeline -c scriptForReprocessing.py --calibrated_final --cleanup

Editing scriptForReprocessing.py Imaging/Processing Parameters

If you open the script in a text editor, you will notice a block of user editable options at the top of the script (shown below). These options modify the imaging pipeline in various ways. Some of the most useful options may be to change mitigation parameters to image all of your science targets and spectral windows, to reimage a portion of your data with a different robust value, or to reimage with a uvtaper applied.

make_mfs_images = True                  # generate mfs (per spw) images
make_cont_images = True                 # generate aggregate continuum images
make_cube_images = True                 # generate cube images 
make_repBW_images = True                # generate images corresponding to the requested representative bandwidth
mitigate = True                         # run hif_checkproductsize() and mitigate created products if necessary. 
                                        # Set to false if you want all spws and all targets imaged at full resolution. 
                                        # WARNING: turning off mitigation may result in very large disk usage. Consider 
                                        # adjusting other mitigation parameter first, or manually selecting the target/spw 
                                        # combinations you want.

For all values below, see the ALMA Pipeline Users Guide and Reference Manual for detailed descriptions.

maxproductsize = 350.                   # for mitigation; in GB
maxcubesize = 40.                       # for mitigation; in GB
maxcubelimit = 60.                      # for mitigation; in GB 

field = None                            # String specifying fields to be imaged; default is all (pending mitigation)
                                        #     Example: ’3C279, M82’
phasecenter = None                      # Direction measure or field id of the image center. The default phase center is 
                                        # set to the mean of the field directions of all fields that are to be image together.
                                        #     Examples: ’ICRS 13:05:27.2780 -049.28.04.458’, "TRACKFIELD" (for ephemeris)
spw = None                              # Spw(s) to image; default is all spws
                                        #     Example: '17, 23'
uvrange = None                          # Select a set of uv ranges to image; default is all
                                        #     Examples: ’0~1000klambda’, [’0~100klambda’,'300~1000klambda']
hm_imsize = None                        # Image x and y size in pixels or PB level; default is automatically determined
                                        #     Examples: ’0.3pb’, [120, 120]
hm_cell = None                          # Image cell size; default is automatically determined
                                        #     Examples: ’3ppb’, [’0.5arcsec’, ’0.5arcsec’]
nbins = None                            # Channel binning factor for each spw; default is none
                                        #     Format:’spw1:nb1,spw2:nb2,...’ with optional wildcards: ’*:nb’
                                        #     Examples: ’9:2,11:4,13:2,15:8’, ’*:2’ 
robust = None                           # Robust value to image with; default is automatically determined
                                        #     Example: 0.5
uvtaper = None                          # Uvtaper to apply to data; default is none
                                        #     Example: ['1arcsec']