CASA Parallel Processing: Difference between revisions

From CASA Guides
Jump to navigationJump to search
Mkrauss (talk | contribs)
No edit summary
Jott (talk | contribs)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Overview ==
== Overview ==


This is meant to be a general guide for testing out the early capabilities in CASA for parallel processing.  As of the time of this writing (23 August 2011), the tasks {{flagdata}}, {{applycal}}, and <tt>partition</tt> are all parallelized.  In addition, there are two versions of parallel imaging available at the task level, <tt>pcont</tt> and <tt>pcube</tt> (for continuum and spectral line imaging, respectively).
This is meant to be a general guide for testing out the early capabilities in CASA for parallel processing.  As of the time of this writing (23 August 2011), the tasks {{flagdata}}, {{clearcal}}, {{applycal}}, and <tt>partition</tt> are all parallelized.  In addition, there are two versions of parallel imaging available at the task level, <tt>pcont</tt> and <tt>pcube</tt> (for continuum and spectral line imaging, respectively).


Feedback at this stage about what works, as well as what doesn't or could use improvement, will be very helpful.  Please send comments, bug reports, and questions to ...
Feedback at this stage about what works, as well as what doesn't or could use improvement, will be very helpful.  Please send comments, bug reports, and questions to ...
Line 13: Line 13:
Parallel processing in CASA is set up to take advantage of both multiple-core machines (as most standard workstations are) as well as shared memory access (as is available in a cluster).  However, the NRAO cluster in Socorro also has the distinction of a very fast connection to the Lustre filesystem, which will boost I/O performance by around 2 orders of magnitude of the standard desktop SATA disk.  Therefore, I/O-limited operations are unlikely to see much improvement with parallel processing.
Parallel processing in CASA is set up to take advantage of both multiple-core machines (as most standard workstations are) as well as shared memory access (as is available in a cluster).  However, the NRAO cluster in Socorro also has the distinction of a very fast connection to the Lustre filesystem, which will boost I/O performance by around 2 orders of magnitude of the standard desktop SATA disk.  Therefore, I/O-limited operations are unlikely to see much improvement with parallel processing.


== Parallelized tasks ==
== Parallelized calibration tasks ==


This is a set current as of 23 August 2011.
This set of tasks described here is current as of 23 August 2011.


=== partition ===
=== partition ===
Line 21: Line 21:
In order to perform parallel processing, the original Measurement Set must be subdivided into smaller chunks that can then be farmed out to multiple cores for processing.  This new, subdivided MS (called a "multi-MS", or MMS) is created by a task called <tt>partition</tt>.  In many ways, <tt>partition</tt> resembles {{split}}, in that it can also time-average or select only a subset of data.  However, frequency averaging is not currently available.   
In order to perform parallel processing, the original Measurement Set must be subdivided into smaller chunks that can then be farmed out to multiple cores for processing.  This new, subdivided MS (called a "multi-MS", or MMS) is created by a task called <tt>partition</tt>.  In many ways, <tt>partition</tt> resembles {{split}}, in that it can also time-average or select only a subset of data.  However, frequency averaging is not currently available.   


Here is the current set of input parameters for <tt>partition</tt>:


<pre>
#  partition :: Experimental extension of split to produce multi-MSs
vis                =        ''        #  Name of input measurement set
outputvis          =        ''        #  Name of output measurement set
createmms          =      True        #  Should this create a multi-MS output
    separationaxis =    'scan'        #  Axis to do parallelization across
    numsubms      =        64        #  The number of SubMSs to create


=== flagdata ===
calmsselection      =     'none'        #  Cal Data Selection ('none', 'auto', 'manual')
datacolumn          =     'data'        #  Which data column(s) to split out
field              =         ''        #  Select field using ID(s) or name(s)
spw                =         ''        #  Select spectral window/channels
antenna            =         ''        #  Select data based on antenna/baseline
timebin            =       '0s'        #  Bin width for time averaging
timerange          =        ''        #  Select data by time range
scan                =        ''        #  Select data by scan numbers
scanintent          =        ''        #  Select data by scan intent
array              =        ''        #  Select (sub)array(s) by array ID number
uvrange            =        ''        #  Select data by baseline length
async              =      False        #  If true the taskname must be started using partition(...)
</pre>


=== applycal ===
The parameters which are specific to parallel processing are 'createmms', 'separationaxis', 'numsubms', and 'calmsselection'.
 
It is currently recommended that 'separationaxis' be set to 'default', which should create sub-MSs which are optimized for parallel processing by dividing the data along scan and spectral window boundaries.  Other options include 'spw' and 'scan', which would force separation across only one of these axes.
 
The optimal number of sub-MSs to create will depend on the processing environment; namely, the number of available cores.  A reasonable rule of thumb is to create twice as many sub-MSs as there are available cores.
 
Finally, there is the option to split out data for calibration sources into one "calibration" MS.  If 'calmsselection' is set to 'manual', several more options are available:
 
<pre>
calmsselection      =   'manual'        #  Cal Data Selection ('none', 'auto', 'manual')
    calmsname      =         ''        #  Name of output measurement set
    calfield      =         ''        #  Field Selection for calibration ms
    calscan        =        ''        #  Select data by scan numbers
    calintent      =        ''        #  Select data by scan intent
</pre>
 
Since the standard calibration tasks are not currently parallelized, creating a dedicated calibration MS will likely result in a speed improvement for this part of processing.
 
=== flagdata, clearcal, and applycal ===
 
{{flagdata}}, {{clearcal}}, and {{applycal}} are parallelized "under-the-hood", which means that they will check to see whether or not they're operating on an MMS prior to running, and act appropriately to take advantage of parallel resources.  No special parameter inputs are necessary to take advantage of this behavior.
 
== Parallelized Clean (task pclean) ==
 
Eventually, {{clean}} will be able to run on clusters and desktops with many CPU cores. This development is ongoing and a first parallelized version of clean is provided by the experimental task {{pclean}}.
 
The inputs of {{pclean}} are:
 
<pre>
#  pclean :: Invert and deconvolve images with parallel engines
vis                =        ''        #  Name of input visibility file
imagename          =        ''        #  Pre-name of output images
imsize              = [256, 256]        #  Image size in pixels (nx,ny), symmetric for single value
cell                = ['1.0arcsec', '1.0arcsec'] #  The image cell size in arcseconds.
phasecenter        =        ''        #  Image center: direction or field index
stokes              =        'I'        #  Stokes params to image (eg I,IV,IQ,IQUV)
mask                =        ''        #  mask image
field              =        ''        #  Field Name or id
spw                =        ''        #  Spectral windows e.g. '0~3', '' is all
ftmachine          =      'ft'        #  Fourier Transform Engine ('ft', 'sd', 'mosaic' or
                                        #  'wproject')
alg                =  'hogbom'        #  Deconvolution algorithm ('clark', 'hogbom', 'multiscale')
majorcycles        =          1        #  Number of major cycles
niter              =        500        #  Maximum number of iterations
gain                =        0.1        #  Gain to use in deconvolution
threshold          =    '0.0Jy'        #  Flux level to stop cleaning, must include units: '1.0mJy'
weighting          =  'natural'        #  Type of weighting
mode                = 'continuum'      #  Clean mode ('continuum', 'cube')
interactive        =      False        #  Interactive clean
overwrite          =      True        #  Overwrite an existing model image
uvtaper            =      False        #  Apply additional uv tapering of visibilities
selectdata          =      False        #  Other data selection parameters
pbcorr              =      False        #  Correct for the primary beam post deconvolution
clusterdef          =        ''        #  File that contains cluster definition
async              =      False        #  If true the taskname must be started using pclean(...)
</pre>
 
 
they differ somewhat from the normal {{clean}} but reflect to some extent the future parameter naming conventions. Stay tuned.
 
{{pclean}} can process both continuum and spectral line data in parallelized mode.
 
Let's start with the cluster setup file. By default, {{pclean}} uses all cores on the current host. This can be changed by specifying a file in the ''clusterdef'' parameter, a file that may look like:
 
<pre>
 
############################
hal9000, 10,  /home/ptest
sal9000, 12,  /home/ptest
nearstar, 6,    /home/ptest
############################
 
</pre>
The cluster definition file has lines with 3 comma seperated fields: computername, number of cores to use and working directory.
 
'''IMPORTANT: the user have to have passwordless ssh access to all the computers used in a cluster definition file and the working directories has to be crossmounted  by all the computers under the same names'''
 
Many of the other parameters are the same as for regular {{clean}}. Those that differ are:
 
* '''mode''' can be "continuum" for multi-scale clean (nterms>1 not supported yet) and "cube" for spectral line imaging which expands further into nchan, start, and step (the latter two should be specified in LSRK velocity units), and a restfrequency
 
* '''alg'''  to define the deconvolution algorithm, either "clark", "hogbom", or "multiscale" (if "multiscale", the scales are specified in a submenu)
 
* '''majorcycles''' to set the number of Cotton-Schwab/Clark major cycles
 
* '''ftmachine''' can be 'ft' for standard interferometric gridding, 'sd' for standard single dish), 'mosaic' for gridding using the primary beam as convolution function), or  'wproject' for wprojection gridder to correct for widefield 'w' term errors. 
 
 
nterms>2 and outlier fields are not yet supported.
 
When the task is run, every process is spewing their status on the screen and with many cores it may look a little bit messy. You will also see a lot of temprary files as well as many logger files. At the end, {{pclean}} will construct the final image or cube and clean up the temporary files.


== Parallel cleaning (toolkit) ==
== Parallel cleaning (toolkit) ==
As of this writing, parallel imaging is only available at the tool level.  Ultimately, the plan is to make these into standard CASA tasks.


=== pcont ===
=== pcont ===
<tt>pcont</tt> is the parallelized continuum imaging tool.  In order to access this functionality, one must first import two items into CASA:
<source lang="python">
# In CASA
from parallel.pimager import pimager
from parallel.parallel_task_helper import ParallelTaskHelper
</source>
Then, to initialize an imaging run:
<source lang="python">
ImgObj = pimager()
</source>
This should launch the parallel engines, and output will be printed to the terminal window to notify that a set of engines has been started.  Once this completes, the pcont tool may be called with the following parameters (here is a representative set):
<source lang="python">
imgObj.pcont(msname='/full/path/to/MS',
            imagename='/full/path/to/output/image',
            imsize=[1280, 1280],
            pixsize=['6arcsec', '6arcsec'],
            phasecenter='J2000 13h30m08 47d10m25',
            field='2',
            spw='0~7:1~62',
            stokes='I',
            ftmachine='ft',
            wprojplanes=128,
            maskimage='/full/path/to/mask',
            majorcycles=3,
            niter=100,
            threshold='0.0mJy',
            alg='clark',
            weight='natural',
            robust=0.0,
            contclean=False,
            visinmem=False)
</source>
Although this does not include the full suite of options available in {{clean}}, it includes a basic set of commonly-used parameters. 


=== pcube ===
=== pcube ===

Latest revision as of 23:35, 8 May 2012

Overview

This is meant to be a general guide for testing out the early capabilities in CASA for parallel processing. As of the time of this writing (23 August 2011), the tasks flagdata, clearcal, applycal, and partition are all parallelized. In addition, there are two versions of parallel imaging available at the task level, pcont and pcube (for continuum and spectral line imaging, respectively).

Feedback at this stage about what works, as well as what doesn't or could use improvement, will be very helpful. Please send comments, bug reports, and questions to ...

More information may also be found in Jeff Kern's presentation, posted [here].

Setting up for parallel processing

Before you can run tasks with parallelization, you must first set up the machine on which CASA will be running to use SSH keys for password-free login. See [the SSH section of the Gold Book] for instructions.

Parallel processing in CASA is set up to take advantage of both multiple-core machines (as most standard workstations are) as well as shared memory access (as is available in a cluster). However, the NRAO cluster in Socorro also has the distinction of a very fast connection to the Lustre filesystem, which will boost I/O performance by around 2 orders of magnitude of the standard desktop SATA disk. Therefore, I/O-limited operations are unlikely to see much improvement with parallel processing.

Parallelized calibration tasks

This set of tasks described here is current as of 23 August 2011.

partition

In order to perform parallel processing, the original Measurement Set must be subdivided into smaller chunks that can then be farmed out to multiple cores for processing. This new, subdivided MS (called a "multi-MS", or MMS) is created by a task called partition. In many ways, partition resembles split, in that it can also time-average or select only a subset of data. However, frequency averaging is not currently available.

Here is the current set of input parameters for partition:

#  partition :: Experimental extension of split to produce multi-MSs
vis                 =         ''        #  Name of input measurement set
outputvis           =         ''        #  Name of output measurement set
createmms           =       True        #  Should this create a multi-MS output
     separationaxis =     'scan'        #  Axis to do parallelization across
     numsubms       =         64        #  The number of SubMSs to create

calmsselection      =     'none'        #  Cal Data Selection ('none', 'auto', 'manual')
datacolumn          =     'data'        #  Which data column(s) to split out
field               =         ''        #  Select field using ID(s) or name(s)
spw                 =         ''        #  Select spectral window/channels
antenna             =         ''        #  Select data based on antenna/baseline
timebin             =       '0s'        #  Bin width for time averaging
timerange           =         ''        #  Select data by time range
scan                =         ''        #  Select data by scan numbers
scanintent          =         ''        #  Select data by scan intent
array               =         ''        #  Select (sub)array(s) by array ID number
uvrange             =         ''        #  Select data by baseline length
async               =      False        #  If true the taskname must be started using partition(...)

The parameters which are specific to parallel processing are 'createmms', 'separationaxis', 'numsubms', and 'calmsselection'.

It is currently recommended that 'separationaxis' be set to 'default', which should create sub-MSs which are optimized for parallel processing by dividing the data along scan and spectral window boundaries. Other options include 'spw' and 'scan', which would force separation across only one of these axes.

The optimal number of sub-MSs to create will depend on the processing environment; namely, the number of available cores. A reasonable rule of thumb is to create twice as many sub-MSs as there are available cores.

Finally, there is the option to split out data for calibration sources into one "calibration" MS. If 'calmsselection' is set to 'manual', several more options are available:

calmsselection      =   'manual'        #  Cal Data Selection ('none', 'auto', 'manual')
     calmsname      =         ''        #  Name of output measurement set
     calfield       =         ''        #  Field Selection for calibration ms
     calscan        =         ''        #  Select data by scan numbers
     calintent      =         ''        #  Select data by scan intent

Since the standard calibration tasks are not currently parallelized, creating a dedicated calibration MS will likely result in a speed improvement for this part of processing.

flagdata, clearcal, and applycal

flagdata, clearcal, and applycal are parallelized "under-the-hood", which means that they will check to see whether or not they're operating on an MMS prior to running, and act appropriately to take advantage of parallel resources. No special parameter inputs are necessary to take advantage of this behavior.

Parallelized Clean (task pclean)

Eventually, clean will be able to run on clusters and desktops with many CPU cores. This development is ongoing and a first parallelized version of clean is provided by the experimental task pclean.

The inputs of pclean are:

#  pclean :: Invert and deconvolve images with parallel engines
vis                 =         ''        #  Name of input visibility file
imagename           =         ''        #  Pre-name of output images
imsize              = [256, 256]        #  Image size in pixels (nx,ny), symmetric for single value
cell                = ['1.0arcsec', '1.0arcsec'] #  The image cell size in arcseconds.
phasecenter         =         ''        #  Image center: direction or field index
stokes              =        'I'        #  Stokes params to image (eg I,IV,IQ,IQUV)
mask                =         ''        #  mask image
field               =         ''        #  Field Name or id
spw                 =         ''        #  Spectral windows e.g. '0~3', '' is all
ftmachine           =       'ft'        #  Fourier Transform Engine ('ft', 'sd', 'mosaic' or
                                        #   'wproject')
alg                 =   'hogbom'        #  Deconvolution algorithm ('clark', 'hogbom', 'multiscale')
majorcycles         =          1        #  Number of major cycles
niter               =        500        #  Maximum number of iterations
gain                =        0.1        #  Gain to use in deconvolution
threshold           =    '0.0Jy'        #  Flux level to stop cleaning, must include units: '1.0mJy'
weighting           =  'natural'        #  Type of weighting
mode                = 'continuum'       #  Clean mode ('continuum', 'cube')
interactive         =      False        #  Interactive clean
overwrite           =       True        #  Overwrite an existing model image
uvtaper             =      False        #  Apply additional uv tapering of visibilities
selectdata          =      False        #  Other data selection parameters
pbcorr              =      False        #  Correct for the primary beam post deconvolution
clusterdef          =         ''        #  File that contains cluster definition
async               =      False        #  If true the taskname must be started using pclean(...)


they differ somewhat from the normal clean but reflect to some extent the future parameter naming conventions. Stay tuned.

pclean can process both continuum and spectral line data in parallelized mode.

Let's start with the cluster setup file. By default, pclean uses all cores on the current host. This can be changed by specifying a file in the clusterdef parameter, a file that may look like:


############################
hal9000, 10,   /home/ptest
sal9000, 12,   /home/ptest
nearstar, 6,    /home/ptest
############################

The cluster definition file has lines with 3 comma seperated fields: computername, number of cores to use and working directory.

IMPORTANT: the user have to have passwordless ssh access to all the computers used in a cluster definition file and the working directories has to be crossmounted by all the computers under the same names

Many of the other parameters are the same as for regular clean. Those that differ are:

  • mode can be "continuum" for multi-scale clean (nterms>1 not supported yet) and "cube" for spectral line imaging which expands further into nchan, start, and step (the latter two should be specified in LSRK velocity units), and a restfrequency
  • alg to define the deconvolution algorithm, either "clark", "hogbom", or "multiscale" (if "multiscale", the scales are specified in a submenu)
  • majorcycles to set the number of Cotton-Schwab/Clark major cycles
  • ftmachine can be 'ft' for standard interferometric gridding, 'sd' for standard single dish), 'mosaic' for gridding using the primary beam as convolution function), or 'wproject' for wprojection gridder to correct for widefield 'w' term errors.


nterms>2 and outlier fields are not yet supported.

When the task is run, every process is spewing their status on the screen and with many cores it may look a little bit messy. You will also see a lot of temprary files as well as many logger files. At the end, pclean will construct the final image or cube and clean up the temporary files.

Parallel cleaning (toolkit)

As of this writing, parallel imaging is only available at the tool level. Ultimately, the plan is to make these into standard CASA tasks.

pcont

pcont is the parallelized continuum imaging tool. In order to access this functionality, one must first import two items into CASA:

# In CASA
from parallel.pimager import pimager
from parallel.parallel_task_helper import ParallelTaskHelper

Then, to initialize an imaging run:

ImgObj = pimager()

This should launch the parallel engines, and output will be printed to the terminal window to notify that a set of engines has been started. Once this completes, the pcont tool may be called with the following parameters (here is a representative set):

imgObj.pcont(msname='/full/path/to/MS',
             imagename='/full/path/to/output/image',
             imsize=[1280, 1280],
             pixsize=['6arcsec', '6arcsec'],
             phasecenter='J2000 13h30m08 47d10m25',
             field='2',
             spw='0~7:1~62',
             stokes='I',
             ftmachine='ft',
             wprojplanes=128,
             maskimage='/full/path/to/mask',
             majorcycles=3,
             niter=100,
             threshold='0.0mJy',
             alg='clark',
             weight='natural',
             robust=0.0,
             contclean=False,
             visinmem=False)

Although this does not include the full suite of options available in clean, it includes a basic set of commonly-used parameters.

pcube