Transient reduction pipeline: Difference between revisions

From CASA Guides
Jump to navigationJump to search
Jlazio (talk | contribs)
Jlazio (talk | contribs)
 
(17 intermediate revisions by the same user not shown)
Line 3: Line 3:
= Transient Reduction Pipeline =
= Transient Reduction Pipeline =


2010 June 16 - T. Joseph W. Lazio
2010 June 20 - T. Joseph W. Lazio


There are a class of observations for which only relatively simple data reduction steps are needed.  One such example is that of transient observations, which are typically conducted in continuum mode and for which one is merely trying to determine a flux density of an essentially unresolved or only partially resolved source.  This guide describes the steps in such a pipeline.
There are a class of observations for which only relatively simple data reduction steps are needed.  One such example is that of transient observations, which are typically conducted in continuum mode and for which one is merely trying to determine a flux density of an essentially unresolved or only partially resolved source.  This guide describes the steps in such a pipeline.
<strong>Warning: This guide was written at the time when the EVLA was still in its commissioning phase.  As the instrument matures, specific steps taken here may need to be adjusted.  Caveat emptor.</strong>


In order to process these data in a semi-automatic fashion, certain assumptions are made
In order to process these data in a semi-automatic fashion, certain assumptions are made
* Data stored as an ALMA Science Data Model (ASDM) file on disk.
* Data stored as an Archive Science Data Model (ASDM) file on disk.
* Sources, listed in order of appearance, are structured as Option A or Option B
* Sources, listed in order of appearance, are structured as Option A or Option B
** Option A: phase calibrator, target source, amplitude calibrator
** Option A: phase calibrator, target source, amplitude calibrator
Line 17: Line 15:
This guide should be set up so that a pipeline script can be [[Extracting_scripts_from_these_tutorials | extracted]] from it.
This guide should be set up so that a pipeline script can be [[Extracting_scripts_from_these_tutorials | extracted]] from it.


 
<strong>Warning: This guide was written at the time when the EVLA was still in its commissioning phase.  As the instrument matures, specific steps taken here may need to be adjusted.  Caveat emptor.</strong>
Possible issues include  
Possible issues include  
* <tt>calwt=F</tt> because at the time of writing the EVLA is not reporting accurate weights (though I believe that this has a stronger effect on extended sources rather than compact sources);
* <tt>calwt=F</tt> because at the time of writing the EVLA is not reporting accurate weights (though I believe that this has a stronger effect on extended sources rather than compact sources);
* the pipeline assumes the same number of channels per spectral window;
* the pipeline assumes the same number of channels per spectral window;
* the pipeline assumes that polarization information is not required;
* the pipeline assumes that the first scan is a dummy scan, as such is required at the time of writing.
* the pipeline assumes that the first scan is a dummy scan, as such is required at the time of writing.


Further, to the best of the author's knowledge, the infrastructure does not yet exist within CASA to determine if there are [http://www.vla.nrao.edu/astro/archive/baselines/ antenna position corrections] that should be applied (via <tt>gencal</tt>).


I've restructured the document to describe the processing steps (conceptually) first, then list the actual script itself.


== User Input ==
== User Input ==


0. Initial set up, advice and warnings, and sanity checks.
I am attempting to migrate the script to become a "pipeline," in the sense of not requiring any user input.  However, there are some pieces of information that it is useful or required to know in order to process the data.
 
* What is the name of the initial Archive Science Data Model (ASDM) file?  It would be possible to assume that the script is being run in a directory that contains only a single file, which is in ASDM format, but that also seems a bit limiting.
 
* What antenna should be used as a reference antenna?  At the time of writing, it is not clear that a robust algorithm exists within CASA for choosing a reference antenna if the user has not specified one.
 
* How much flagging should be done?  The script does some simple clipping, designed to remove any egregious RFI or horribly performing antenna or baseline.  This clipping is done in terms of the rms visibility amplitude in the science target data, flagging data above some large threshold (e.g., 50<math>\sigma</math>).  This threshold is under user control, but the current flagging in this script is certainly not equal to a human lovingly massaging the visibility data.
 
These must be done before invoking the script/pipeline.  For example
<pre>
CASA <1> ASDM_name='AL007_sb123456789_1.56000.1234567890'
CASA <2> refant='ea21'
CASA <3> clip_sigma=50.0
CASA <4> execfile('transient_pipeline.py')
</pre>
A sensible, large value is adopted for <tt>clip_sigma</tt> if it is omitted.
 
== Processing Steps ==
 
=== Initial Stuff ===
 
* Read the data from the ASDM file converting it to a Measurement Set with <tt>importevla</tt>.  Apply basic flagging operations (zeros, shadowing) here.
 
* Flag the first (dummy) scan.  (This step is required at the time of development, but it may be relaxed in the future.)
 
* Flag (quack) first 10 seconds of each scan.
 
* Having constructed the initial measurement set, pause to extract various useful items from it, such as the frequency of observation, number of spectral channels, etc.  Use these to then calculate various quantities such as the primary beam, synthesized beam, tolerable amount of bandwidth smearing, and tolerable amount of time-average smearing.
 
* Reduce the data in size for faster processing downstream by averaging in time and frequency.  Also, reject the edge channels.
 
=== Calibration ===
 
* Set the flux density of the amplitude calibrator, if one exists.  The pipeline uses the calibration codes stored in the measurement set to determine which, if any, sources are suitable for calibrating the flux density scale.  If it finds no source that is suitable, it sets all sources that are suitable for calibrating the gain phases and the spectral bandpass to have a flux density of 1 Jy.  There is a potential issue, at the time of writing, if the frequency setting is such that one should use a model image at a different band than one is observing, e.g., observing near the top of the C band where an X band model might be more appropriate.  Presumably once all of the new receivers have been installed, new calibrator models will follow at some point.
 
* Make a quick-n-dirty bandpass, if a spectral bandpass calibrator exists.
 
* Amplitude and phase calibration.  This is done in two steps, first the amplitude calibrator if one exists, then the phase calibrator.  The two steps could be combined, but it is not clear that doing so would provide much benefit.
 
* Apply the calibration, from the phase calibrator to the science target.
 
* Form target source measurement set.
 
* Clip data above some a threshold.  The user can select this threshold by specifying <tt>clip_sigma</tt> prior to invoking the pipeline.  If no value is given, a value of 50<math>\sigma</math> is assumed.
 
=== Imaging ===
 
* The source is imaged and a light CLEANing is applied.
 
== Pipeline ==


<source lang="python">
<source lang="python">
# In CASA
# In CASA
#!/bin/env python
#
# A first step toward an EVLA transient reduction pipeline.
#
# Warning: This script was written at the time that the
# EVLA was undergoing commissioning.  Details may change.
# Caveat emptor!
#
# In particular, current version does not calibrate
# weights within applycal as EVLA does not currently
# report accurate weights.
#
#
########################################################################
#
import os.path, math
import os.path, math
#
#
Version='v. 0.0 TJWL  2010-05-11'
#Version='v. 0.0 TJWL  2010-05-11'
#Version='v. 0.1 TJWL  2010-06-16'
#Version='v. 0.2 TJWL  2010-06-19'
#Version='v. 0.3 TJWL  2010-06-20'
Version='v. 0.4 TJWL  2010-06-20'
# adopt L. Chiomuk's suggestion of incorporating spectral window
# information
# add flagging information
# start moving away from user input
#
CalModels='/home/casa/data/nrao/VLA/CalModels'
CalModels='/home/casa/data/nrao/VLA/CalModels'
c=2.99792458E8
c=2.99792458E8
Line 49: Line 125:
print "Caveat emptor!"
print "Caveat emptor!"
print " "
print " "
print "Assumption: Data stored as an ALMA Science Data Model (ASDM)"
print "Assumption: Data stored as an Archive Science Data Model (ASDM)"
print "            file on disk."
print "            file on disk."
print " "
print " "
Line 68: Line 144:
     raise IOError('Calibrator model files missing?  Stopping.')
     raise IOError('Calibrator model files missing?  Stopping.')
#
#
print " "
 
########################################################################
# User specified parameters
try: clip_sigma
except NameError:
    clip_sigma=50.0
 
try: ASDM_name
except NameError:
    raise NameError('ASDM_name variable not specified')
 
if not os.path.exists(ASDM_name):
    raise IOError('ASDM file does not exist.')
else:
    msname=ASDM_name+'.ms'
 
try: refant
except NameError:
    raise NameError('Reference antenna not specified in variable refant.')
 
########################################################################
# definitions
#
acal_std_name = {}
acal_std_name = {}
acal_std_name['3C286'] = frozenset(['3C286','3C 286',
acal_std_name['3C286'] = frozenset(['3C286','3C 286',
Line 82: Line 180:
                                   '0134+32','0134+329','0134+3254','B0134+32','B0134+329','B0134+3254',
                                   '0134+32','0134+329','0134+3254','B0134+32','B0134+329','B0134+3254',
                                   '0137+331','0137+3309','J0137+331','J0137+3309'])
                                   '0137+331','0137+3309','J0137+331','J0137+3309'])
</source>
#
 
########################################################################
1. Obtain the name of the ASDM file, use it to construct the Measurement Set name.
#
 
# Initial stuff.
<source lang="python">
#
# In CASA
# Read the data from the ASDM file converting it to a Measurement Set
ASDM_name=raw_input('ASDM file name:')
# with importevla.
msname=ASDM_name+'.ms'
# Apply basic flagging operations (zeros, shadowing) here.
</source>
#
 
2. Determine the order in which sources appear in the observation.
 
<source lang="python">
# In CASA
obs_structure=raw_input('Structure of observations [A|B]:')
if obs_structure == 'A':
    sources = {'acal':'2', 'pcal':'0', 'target':'1'}
elif obs_structure == 'B':
    sources = {'acal':'0', 'pcal':'1', 'target':'2'}
</source>
 
3. Reference antenna.  (Should be able to determine this from the measurement set!!)
 
<source lang="python">
# In CASA
refant=raw_input('Enter desired reference antenna [e.g., ea21]: ')
</source>
 
== Processing Steps ==
 
1. Read the data from the ASDM file converting to a Measurement Set with ''importevla''. Apply basic flagging operations (zeros, shadowing) here.
 
<source lang="python">
importevla(asdm=ASDM_name, vis=msname,
importevla(asdm=ASDM_name, vis=msname,
           flagzero=True, flagpol=True, shadow=True)
           flagzero=True, flagpol=True, shadow=True)
</source>
#
 
# Flag the first (dummy) scan.
2. Flag first (dummy) scan.
# Required at the time of development, this step may be relaxed
 
# in the future.
<source lang="python">
#
print "Flagging first (dummy) scan ..."
flagdata(vis=msname,
flagdata(vis=msname,
         mode='manualflag',selectdata=T, scan='1')
         mode='manualflag',selectdata=T, scan='1')
</source>


3. Flag first 10 seconds of each scan.
#
 
# Flag (quack) first 10 seconds of each scan.
<source lang="python">
#
print "Quack data ..."
flagdata(vis=msname,
flagdata(vis=msname,
         mode='quack',quackinterval=10,quackmode='beg',
         mode='quack',quackinterval=10,quackmode='beg',
         selectdata=F)
         selectdata=F)
</source>


4. Extract various useful quantities from the data.
#
# Extract various useful quantities ....
#
tb.open(msname)
uvw=tb.getcol('UVW')
interval=tb.getcol('INTERVAL')
tb.close(msname)


<source lang="python">
source_table=msname + '/SOURCE'
hdvalue=vishead(vis=msname, mode='get',
tb.open(source_table)
                hdkey='field', hdindex=sources['acal'])
calcode=tb.getcol("CODE")
acal_name=hdvalue[0]
source_id=tb.getcol("SOURCE_ID")
source_name=tb.getcol("NAME")
tb.close(source_table)


spw_table=msname + '/SPECTRAL_WINDOW'
spw_table=msname + '/SPECTRAL_WINDOW'
Line 147: Line 229:
num_chan=tb.getcol("NUM_CHAN")
num_chan=tb.getcol("NUM_CHAN")
tb.close(spw_table)
tb.close(spw_table)
freq=freq_list[0]
freq=freq_list[0]
if (1E9 < freq) and (freq < 2E9):
if (1E9 < freq) and (freq < 2E9):
Line 165: Line 248:
wavelength=c/freq
wavelength=c/freq
primary_beam=(1.02*wavelength/D)*degrad  # HPBW, from Napier (1999)
primary_beam=(1.02*wavelength/D)*degrad  # HPBW, from Napier (1999)
 
FoV=(primary_beam/math.sqrt(2))*arcsecond_deg
tb.open(msname)
uvw=tb.getcol('UVW')
interval=tb.getcol('INTERVAL')
tb.close(msname)


b=[]
b=[]
Line 179: Line 258:
synthesized_beam=(1.2/bmax)*degrad*arcsecond_deg
synthesized_beam=(1.2/bmax)*degrad*arcsecond_deg


 
#
##############################
# How much time-average smearing can be tolerated?
# How much time-average
# smearing can be tolerated?
# Assume no more than time_smearing_loss peak
# Assume no more than time_smearing_loss peak
# intensity loss over half of
# intensity loss over half of primary beam.
# primary beam.
# Follow Section 2 of Bridle & Schwab (1999).
# Follow Section 2 of Bridle
# Note undocumented option that the amount of time-average
# & Schwab (1999).
# smearing loss can be specified by user.
#
#
time_smearing_loss=0.01
try: time_smearing_loss
FoV=(primary_beam/math.sqrt(2))*arcsecond_deg
except NameError:
    time_smearing_loss=0.01
 
tau=math.sqrt(time_smearing_loss*1E9)*(synthesized_beam/FoV)
tau=math.sqrt(time_smearing_loss*1E9)*(synthesized_beam/FoV)


Line 197: Line 276:
     tau=arbitrary_maximum
     tau=arbitrary_maximum


dt=interval[0]
dt=min(interval)
for dtprime in interval:
#
    if dtprime < dt:
        dt=dtprime
 
# tau is allowed value, dt is actual (minimum)
# tau is allowed value, dt is actual (minimum)
# (could be an issue if baseline-dependent
# (could be an issue if baseline-dependent
#  correlator accumulation used)
#  correlator accumulation used)
# make sure that tau is an integer
# make sure that tau is an integer multiple of dt
# multiple of dt
#  
#  
tau=dt*math.floor(tau/dt)
tau=dt*math.floor(tau/dt)
Line 217: Line 292:
print " "
print " "


##############################
#
# How much bandwidth
# How much bandwidth smearing can be tolerated?
# smearing can be tolerated?
# Assume no more than band_smearing_loss peak
# Assume no more than band_smearing_loss peak
# intensity loss over half of
# intensity loss over half of primary beam.
# primary beam.
# Follow Section 1 of Bridle & Schwab (1999).
# Follow Section 1 of Bridle
# Assume square bandpass, no taper, expand resulting
# & Schwab (1999).
# Assume square bandpass, no
# taper, expand resulting
# sine integral to lowest order
# sine integral to lowest order
# Note undocumented option that the amount of time-average
# smearing loss can be specified by user.
#
#
band_smearing_loss=0.01
try: band_smearing_loss
except NameError:
    band_smearing_loss=0.01
 
eta_band=3.79
eta_band=3.79
delta_nu=freq*(2/eta_band)*(synthesized_beam/FoV)*math.sqrt(18*band_smearing_loss)
delta_nu=freq*(2/eta_band)*(synthesized_beam/FoV)*math.sqrt(18*band_smearing_loss)
 
#
# delta_nu is allowed value,
# delta_nu is allowed value,
# figure out even divisibles of the actual
# figure out even divisibles of the actual
Line 254: Line 330:
print "Averaged channel width [kHz]: ", (nchav*min(channel_width[0]))/1E3
print "Averaged channel width [kHz]: ", (nchav*min(channel_width[0]))/1E3
print "Number of channels: ", nchav
print "Number of channels: ", nchav
</source>


5. Compress data for faster processing.
#
# Compress data for faster processing.
#    Throw away edge channels.
#
 
bchan=num_chan[0]*0.1
echan=num_chan[0]*0.9
chans='*:%d~%d'%(bchan, echan)


<source lang="python">
outmsname=ASDM_name+'_split.ms'
outmsname=ASDM_name+'_split.ms'
tave='%.fs'%tau
tave='%.fs'%tau
split(vis=msname,outputvis=outmsname,
split(vis=msname,outputvis=outmsname,
       datacolumn='data',
       datacolumn='data',
      spw=chans,
       timebin=tave,width=int(nchav))
       timebin=tave,width=int(nchav))
</source>


6. Set flux density of amplitude calibrator.
########################################################################
  Need some heuristics here.
#
   Also could be an issue if the frequency setting is such that one should use a model image at a different band than one is observing, e.g., observing near the top of the C band where an X band model might be more appropriate.
# Calibration
#
#   The could be an issue if the frequency
#    setting is such that one should use a
#    model image at a different band than one
#    is observing, e.g., observing near the
#    top of the C band where an X band model
#    might be more appropriate.


<source lang="python">
# New style EVLA (indicators of scan intents):
for tstname in acal_std_name.keys():
#    D = calibrator useful to determine Complex Gains
     if acal_name in acal_std_name[tstname]:
#    E = calibrator useful to determine the absolute Flux Density
         modimage=tstname
#        Scale; typically only used for 3C48, 3C147, 3C286 (and 3C138,
modimage=CalModels + '/' + modimage + '_' + band + '.im'
#        3C295)
setjy(vis=outmsname,field=sources['acal'],modimage=modimage)
#    F = calibrator useful to determine the spectral Bandpass response
</source>
#    G = calibrator useful to determine Polarization Angle
#     H = calibrator useful to determine the instrumental Polarization
#         Leakage
#    I = calibrator for Complex Gain and Bandpass
#    J = calibrator for Complex Gain and Polarization Leakage
#    K = calibrator for Flux Density Scale and Bandpass
#    L = calibrator for Flux Density Scale and Polarization Angle
#    M = calibrator for Bandpass and Polarization Angle
#    N = calibrator for Flux Density Scale, Bandpass and Polarization
#        Angle
#    Y = recognised as calibrator source in the VLA catalog by the
#        pipeline; only appears when no other calcodes are present and
#        NOT an indication that this calibrator is useful at this
#        frequency or in this array
#    Z = any other non-Target combination of intents
#
# Code  Gain Flux  BP Pang  P%
#  D      X
#  E          X
#  F              X
#  G                    X
#  H                        X
#  I      X        X
#  J      X                X
#  K          X  X
#  L          X        X
#  M              X    X
#  N          X  X    X
#
print " Beginning calibration ..."
clearstat()


flux_cal=F
for i in range(0, len(calcode)):
    if calcode[i]=='E' or calcode[i]=='K' or calcode[i]=='L' or calcode[i]=='N':
        flux_cal=T
        break


7. Quick-n-dirty bandpass
if flux_cal==T:
 
    iflux_cal=i
<source lang="python">
    for tstname in acal_std_name.keys():
Bcaltable=ASDM_name + '.B1'
        if source_name[i] in acal_std_name[tstname]:
bandpass(vis=outmsname,caltable=Bcaltable,
            modimage=CalModels + '/' + tstname + '_' + band + '.im'
        field=sources['acal'],
            break
        solint='inf',combine='scan',refant=refant,
        #
        solnorm=T,
        setjy(vis=outmsname,field=source_name[i],modimage=modimage)
        bandtype='B',
else:
        append=F)
    for i in range(0, len(calcode)):
</source>
        if calcode[i]=='D' or calcode=='F' or calcode[i]=='I' or calcode[i]=='J' or calcode=='M':
 
            setjy(vis=outmsname,field=source_name[i],modimage='',fluxdensity=[1.0,0.0,0.0,0.0])
 
#
8. amplitude and phase calibration
# Quick-n-dirty bandpass
#
for i in range(0, len(calcode)):
    if calcode[i]=='F' or calcode[i]=='I' or calcode=='K' or calcode=='M' or calcode=='N':
        print " Finding bandpass ..."
        band_cal=T
        #
        Bcaltable=ASDM_name + '.B1'
        Gcaltable=ASDM_name + '.G0'
        gaincal(vis=outmsname,caltable=Gcaltable,
                field=source_name[i],       
                solint='int',refant=refant,
                solnorm=T,calmode='p',minsnr=5)
        bandpass(vis=outmsname,caltable=Bcaltable,
                field=source_name[i],
                solint='inf',combine='scan',refant=refant,
                solnorm=T,
                gaintable=[Gcaltable])
        break
#
# Amplitude and phase calibration
#
print " Amplitude and phase calibration ..."


<source lang="python">
Gcaltable=ASDM_name + '.G1'
Gcaltable=ASDM_name + '.G1'
gaincal(vis=outmsname,caltable=Gcaltable,
if band_cal==T:
        field=sources['acal'],
    gaintable=[Bcaltable]
        solint='inf',refant=refant,
else:
        minsnr=5,
    gaintable=[]
        solnorm=F,
        gaintype='G',calmode='ap',
        append=F,
        gaintable=Bcaltable)


gaincal(vis=outmsname,caltable=Gcaltable,
if flux_cal==T:
        field=sources['pcal'],
    gaincal(vis=outmsname,caltable=Gcaltable,
        solint='inf',refant=refant,
            field=source_name[iflux_cal],
        minsnr=5,
            solint='inf',combine='scan',refant=refant,
        solnorm=F,
            minsnr=5,
        gaintype='G',calmode='ap',
            gaintable=gaintable)
        append=T,
        gaintable=Bcaltable)
</source>


for i in range(0, len(calcode)):
    if calcode[i]=='D' or calcode[i]=='I' or calcode=='J':
        gaincal(vis=outmsname,caltable=Gcaltable,
                field=source_name[i],
                solint='inf',refant=refant,
                minsnr=3,
                append=T,
                gaintable=gaintable)
#
# Apply calibration.
#
if flux_cal==T:
    fluxtable=ASDM_name + '.flux1'
    for i in range(0, len(calcode)):
        if calcode[i]=='D' or calcode[i]=='I' or calcode=='J':
            fluxscale(vis=outmsname,caltable=Gcaltable,fluxtable=fluxtable,
                      reference=source_name[iflux_cal],
                      transfer=source_name[i])


9. Apply calibration.
if flux_cal==T and band_cal==T:
    gaintable=[Bcaltable,fluxtable]
    gainfield=['',source_name[iflux_cal]]
elif flux_cal==T and band_cal==F:
    gaintable=[fluxtable]
    gainfield=[source_name[iflux_cal]]
elif flux_cal==F and band_cal==T:
    gaintable=[Bcaltable,Gcaltable]
    gainfield=[]
else:
    gaintable=[Gcaltable]
    gainfield=[]


<source lang="python">
for i in range(0, len(calcode)):
fluxtable=ASDM_name + '.flux1'
    if calcode[i]=='D' or calcode[i]=='I' or calcode=='J':
fluxscale(vis=outmsname,caltable=Gcaltable,fluxtable=fluxtable,
        applycal(vis=outmsname,
          reference=sources['acal'],
                field=''
          transfer=sources['pcal'],
                gaintable=gaintable,
          append=F)
                gainfield=gainfield,
                calwt=F)
        break
#
# Form target source measurement set.
#
print " Forming target source measurement set ..."


applycal(vis=outmsname,
for i in range(0, len(calcode)):
        field=sources['pcal'],
    if calcode[i]=='Z' or calcode[i]=='NONE':
        gaintable=[Bcaltable, Gcaltable])
        targetms=source_name[i] + '.ms'
        split(vis=outmsname,outputvis=targetms,
              datacolumn='corrected',
              field=source_name[i])
#
# Clip any egregious data.
#
Vstat=visstat(vis=outmsname,selectdata=F)
Vclip=clip_sigma*Vstat['DATA']['rms']


applycal(vis=outmsname,
flagdata(vis=outmsname,
         field=sources['target'],
         mode='manualflag',
         gaintable=[Bcaltable, Gcaltable])
        clipexpr='ABS RR',clipminmax=[0.0,Vclip],
</source>
        selectdata=F)
flagdata(vis=outmsname,
        mode='manualflag',
         clipexpr='ABS LL',clipminmax=[0.0,Vclip],
        selectdata=F)


########################################################################
#
# Imaging and CLEANing.
#
print " Image the data ..."
clearstat()


10. Form target source measurement set.
<source lang="python">
hdvalue=vishead(vis=msname, mode='get',
        hdkey='field', hdindex=sources['target'])
target_name=hdvalue[0]
targetms=target_name + '.ms'
split(vis=outmsname,outputvis=targetms,
      datacolumn='corrected',
      field=sources['target'])
</source>
11. Image and CLEAN.
<source lang="python">
imagename=target_name + '.im'
imagename=target_name + '.im'
cells='%.3farcsec'%(synthesized_beam/4)
cells='%.3farcsec'%(synthesized_beam/4)

Latest revision as of 20:02, 20 June 2010


Transient Reduction Pipeline

2010 June 20 - T. Joseph W. Lazio

There are a class of observations for which only relatively simple data reduction steps are needed. One such example is that of transient observations, which are typically conducted in continuum mode and for which one is merely trying to determine a flux density of an essentially unresolved or only partially resolved source. This guide describes the steps in such a pipeline.

In order to process these data in a semi-automatic fashion, certain assumptions are made

  • Data stored as an Archive Science Data Model (ASDM) file on disk.
  • Sources, listed in order of appearance, are structured as Option A or Option B
    • Option A: phase calibrator, target source, amplitude calibrator
    • Option B: amplitude calibrator, phase calibrator, target source

This guide should be set up so that a pipeline script can be extracted from it.

Warning: This guide was written at the time when the EVLA was still in its commissioning phase. As the instrument matures, specific steps taken here may need to be adjusted. Caveat emptor. Possible issues include

  • calwt=F because at the time of writing the EVLA is not reporting accurate weights (though I believe that this has a stronger effect on extended sources rather than compact sources);
  • the pipeline assumes the same number of channels per spectral window;
  • the pipeline assumes that polarization information is not required;
  • the pipeline assumes that the first scan is a dummy scan, as such is required at the time of writing.

Further, to the best of the author's knowledge, the infrastructure does not yet exist within CASA to determine if there are antenna position corrections that should be applied (via gencal).


I've restructured the document to describe the processing steps (conceptually) first, then list the actual script itself.

User Input

I am attempting to migrate the script to become a "pipeline," in the sense of not requiring any user input. However, there are some pieces of information that it is useful or required to know in order to process the data.

  • What is the name of the initial Archive Science Data Model (ASDM) file? It would be possible to assume that the script is being run in a directory that contains only a single file, which is in ASDM format, but that also seems a bit limiting.
  • What antenna should be used as a reference antenna? At the time of writing, it is not clear that a robust algorithm exists within CASA for choosing a reference antenna if the user has not specified one.
  • How much flagging should be done? The script does some simple clipping, designed to remove any egregious RFI or horribly performing antenna or baseline. This clipping is done in terms of the rms visibility amplitude in the science target data, flagging data above some large threshold (e.g., 50[math]\displaystyle{ \sigma }[/math]). This threshold is under user control, but the current flagging in this script is certainly not equal to a human lovingly massaging the visibility data.

These must be done before invoking the script/pipeline. For example

CASA <1> ASDM_name='AL007_sb123456789_1.56000.1234567890'
CASA <2> refant='ea21'
CASA <3> clip_sigma=50.0
CASA <4> execfile('transient_pipeline.py')

A sensible, large value is adopted for clip_sigma if it is omitted.

Processing Steps

Initial Stuff

  • Read the data from the ASDM file converting it to a Measurement Set with importevla. Apply basic flagging operations (zeros, shadowing) here.
  • Flag the first (dummy) scan. (This step is required at the time of development, but it may be relaxed in the future.)
  • Flag (quack) first 10 seconds of each scan.
  • Having constructed the initial measurement set, pause to extract various useful items from it, such as the frequency of observation, number of spectral channels, etc. Use these to then calculate various quantities such as the primary beam, synthesized beam, tolerable amount of bandwidth smearing, and tolerable amount of time-average smearing.
  • Reduce the data in size for faster processing downstream by averaging in time and frequency. Also, reject the edge channels.

Calibration

  • Set the flux density of the amplitude calibrator, if one exists. The pipeline uses the calibration codes stored in the measurement set to determine which, if any, sources are suitable for calibrating the flux density scale. If it finds no source that is suitable, it sets all sources that are suitable for calibrating the gain phases and the spectral bandpass to have a flux density of 1 Jy. There is a potential issue, at the time of writing, if the frequency setting is such that one should use a model image at a different band than one is observing, e.g., observing near the top of the C band where an X band model might be more appropriate. Presumably once all of the new receivers have been installed, new calibrator models will follow at some point.
  • Make a quick-n-dirty bandpass, if a spectral bandpass calibrator exists.
  • Amplitude and phase calibration. This is done in two steps, first the amplitude calibrator if one exists, then the phase calibrator. The two steps could be combined, but it is not clear that doing so would provide much benefit.
  • Apply the calibration, from the phase calibrator to the science target.
  • Form target source measurement set.
  • Clip data above some a threshold. The user can select this threshold by specifying clip_sigma prior to invoking the pipeline. If no value is given, a value of 50[math]\displaystyle{ \sigma }[/math] is assumed.

Imaging

  • The source is imaged and a light CLEANing is applied.

Pipeline

# In CASA

#!/bin/env python
#
# A first step toward an EVLA transient reduction pipeline.
#
# Warning: This script was written at the time that the
# EVLA was undergoing commissioning.  Details may change.
# Caveat emptor!
#
# In particular, current version does not calibrate
# weights within applycal as EVLA does not currently
# report accurate weights.
#
#
########################################################################
#
import os.path, math
#
#Version='v. 0.0 TJWL  2010-05-11'
#Version='v. 0.1 TJWL  2010-06-16'
#Version='v. 0.2 TJWL  2010-06-19'
#Version='v. 0.3 TJWL  2010-06-20'
Version='v. 0.4 TJWL  2010-06-20'
# adopt L. Chiomuk's suggestion of incorporating spectral window
# information
# add flagging information
# start moving away from user input
#
CalModels='/home/casa/data/nrao/VLA/CalModels'
c=2.99792458E8
D=24.5
degrad=180/pi
arcsecond_deg=3600
#
print " "
print "EVLA Transient Reduction Pipeline"
print " "
print Version
print " "
print "Warning: This script was written at the time that the"
print "EVLA was undergoing commissioning.  Details may change."
print "Caveat emptor!"
print " "
print "Assumption: Data stored as an Archive Science Data Model (ASDM)"
print "            file on disk."
print " "
print "Assumption: Sources, listed in order of appearance, are structured"
print "            as Option A or Option B"
print "            Option A: phase calibrator"
print "                      target source"
print "                      amplitude calibrator"
print " "
print "            Option B: amplitude calibrator"
print "                      phase calibrator"
print "                      target source"
print " "
#
print "Assumption: Calibrator model files stored in "
print CalModels
if not os.path.isdir(CalModels):
    raise IOError('Calibrator model files missing?  Stopping.')
#

########################################################################
# User specified parameters
try: clip_sigma
except NameError:
    clip_sigma=50.0

try: ASDM_name
except NameError:
    raise NameError('ASDM_name variable not specified')

if not os.path.exists(ASDM_name):
    raise IOError('ASDM file does not exist.')
else:
    msname=ASDM_name+'.ms'

try: refant
except NameError:
    raise NameError('Reference antenna not specified in variable refant.')

########################################################################
# definitions
#
acal_std_name = {}
acal_std_name['3C286'] = frozenset(['3C286','3C 286',
                                    '1328+30','1328+307','B1328+307','B1328+30',
                                    '1331+305','J1331+305','J1331+3030'])
acal_std_name['3C138'] = frozenset(['3C138','3C 138',
                                    '0518+16','0518+165','0518+1635','B0518+16','B0518+165','B0518+1635',
                                    '0521+166','0521+1638','J0521+166','J0521+1638'])
acal_std_name['3C147'] = frozenset(['3C147','3C 147',
                                    '0538+49','0538+498','0538+4949','B0538+49','B0538+498','B0538+4949',
                                    '0542+498','0542+4951','J0542+498','J0542+4951'])
acal_std_name['3C48'] = frozenset(['3C48','3C 48',
                                   '0134+32','0134+329','0134+3254','B0134+32','B0134+329','B0134+3254',
                                   '0137+331','0137+3309','J0137+331','J0137+3309'])
#
########################################################################
#
# Initial stuff.
#
# Read the data from the ASDM file converting it to a Measurement Set
# with importevla.
# Apply basic flagging operations (zeros, shadowing) here.
#
importevla(asdm=ASDM_name, vis=msname,
           flagzero=True, flagpol=True, shadow=True)
#
# Flag the first (dummy) scan.
# Required at the time of development, this step may be relaxed
# in the future.
#
print "Flagging first (dummy) scan ..."
flagdata(vis=msname,
         mode='manualflag',selectdata=T, scan='1')

#
# Flag (quack) first 10 seconds of each scan.
#
print "Quack data ..."
flagdata(vis=msname,
         mode='quack',quackinterval=10,quackmode='beg',
         selectdata=F)

#
# Extract various useful quantities ....
#
tb.open(msname)
uvw=tb.getcol('UVW')
interval=tb.getcol('INTERVAL')
tb.close(msname)

source_table=msname + '/SOURCE'
tb.open(source_table)
calcode=tb.getcol("CODE")
source_id=tb.getcol("SOURCE_ID")
source_name=tb.getcol("NAME")
tb.close(source_table)

spw_table=msname + '/SPECTRAL_WINDOW'
tb.open(spw_table)
freq_list=tb.getcol("REF_FREQUENCY")
channel_width=tb.getcol("CHAN_WIDTH")
num_chan=tb.getcol("NUM_CHAN")
tb.close(spw_table)

freq=freq_list[0]
if (1E9 < freq) and (freq < 2E9):
    band = 'L'
elif (2E9 < freq) and (freq < 4E9):
    band = 'S'
elif (4E9 < freq) and (freq < 8E9):
    band='C'
elif (8E9 < freq) and (freq < 12E9):
    band='X'
elif (12E9 < freq) and (freq < 40E9):
    band = 'K'
elif freq > 40E9:
    band = 'Q'
print "Observations are determined to be in the ", band, " band.\n"


wavelength=c/freq
primary_beam=(1.02*wavelength/D)*degrad   # HPBW, from Napier (1999)
FoV=(primary_beam/math.sqrt(2))*arcsecond_deg

b=[]
for u, v, w in zip(uvw[0], uvw[1], uvw[2]):
    b.append(math.sqrt(u*u+v*v)/wavelength)
bmax=max(b)
                                          # HPBW, synthesized beam,
                                          # from Bridle & Schwab (1999)
synthesized_beam=(1.2/bmax)*degrad*arcsecond_deg

#
# How much time-average smearing can be tolerated?
# Assume no more than time_smearing_loss peak
# intensity loss over half of primary beam.
# Follow Section 2 of Bridle & Schwab (1999).
# Note undocumented option that the amount of time-average
# smearing loss can be specified by user.
#
try: time_smearing_loss
except NameError:
    time_smearing_loss=0.01

tau=math.sqrt(time_smearing_loss*1E9)*(synthesized_beam/FoV)

arbitrary_maximum=30
if tau > arbitrary_maximum:
    tau=arbitrary_maximum

dt=min(interval)
#
# tau is allowed value, dt is actual (minimum)
# (could be an issue if baseline-dependent
#  correlator accumulation used)
# make sure that tau is an integer multiple of dt
# 
tau=dt*math.floor(tau/dt)
if tau < dt:
    tau=dt

print "Data will be averaged in time."
print "Original time sampling [s]: ", dt
print "Averaging time [s]:         ", tau
print " "

#
# How much bandwidth smearing can be tolerated?
# Assume no more than band_smearing_loss peak
# intensity loss over half of primary beam.
# Follow Section 1 of Bridle & Schwab (1999).
# Assume square bandpass, no taper, expand resulting
# sine integral to lowest order
# Note undocumented option that the amount of time-average
# smearing loss can be specified by user.
#
try: band_smearing_loss
except NameError:
    band_smearing_loss=0.01

eta_band=3.79
delta_nu=freq*(2/eta_band)*(synthesized_beam/FoV)*math.sqrt(18*band_smearing_loss)
#
# delta_nu is allowed value,
# figure out even divisibles of the actual
# value, stored in num_chan that is smaller than
# delta_nu
# potential bug if uneven channel widths
# used in different spectral windows

dnu=channel_width[0][0]
nchan_log2=math.log(num_chan[0],2)
for i in range(int(nchan_log2-1), 0, -1):
    if (dnu*math.pow(2,i)) < delta_nu:
        nchav=math.pow(2,i)
        break

if nchav < 1:
    nchav=1

print "Data will be averaged in frequency."
print "Original channel width [kHz]: ", min(channel_width[0])/1E3
print "Averaged channel width [kHz]: ", (nchav*min(channel_width[0]))/1E3
print "Number of channels: ", nchav

#
# Compress data for faster processing.
#    Throw away edge channels.
#

bchan=num_chan[0]*0.1
echan=num_chan[0]*0.9
chans='*:%d~%d'%(bchan, echan)

outmsname=ASDM_name+'_split.ms'
tave='%.fs'%tau
split(vis=msname,outputvis=outmsname,
      datacolumn='data',
      spw=chans,
      timebin=tave,width=int(nchav))

########################################################################
#
# Calibration
#
#    The could be an issue if the frequency
#    setting is such that one should use a
#    model image at a different band than one
#    is observing, e.g., observing near the
#    top of the C band where an X band model
#    might be more appropriate.

# New style EVLA (indicators of scan intents):
#     D = calibrator useful to determine Complex Gains
#     E = calibrator useful to determine the absolute Flux Density
#         Scale; typically only used for 3C48, 3C147, 3C286 (and 3C138,
#         3C295)
#     F = calibrator useful to determine the spectral Bandpass response
#     G = calibrator useful to determine Polarization Angle
#     H = calibrator useful to determine the instrumental Polarization
#         Leakage
#     I = calibrator for Complex Gain and Bandpass
#     J = calibrator for Complex Gain and Polarization Leakage
#     K = calibrator for Flux Density Scale and Bandpass
#     L = calibrator for Flux Density Scale and Polarization Angle
#     M = calibrator for Bandpass and Polarization Angle
#     N = calibrator for Flux Density Scale, Bandpass and Polarization
#         Angle
#     Y = recognised as calibrator source in the VLA catalog by the
#         pipeline; only appears when no other calcodes are present and
#         NOT an indication that this calibrator is useful at this
#         frequency or in this array
#     Z = any other non-Target combination of intents
# 
# Code   Gain Flux  BP Pang  P%
#   D      X
#   E           X
#   F               X
#   G                    X
#   H                        X
#   I      X        X
#   J      X                 X
#   K           X   X
#   L           X        X
#   M               X    X
#   N           X   X    X
#
print " Beginning calibration ..."
clearstat()

flux_cal=F
for i in range(0, len(calcode)):
    if calcode[i]=='E' or calcode[i]=='K' or calcode[i]=='L' or calcode[i]=='N':
        flux_cal=T
        break

if flux_cal==T:
    iflux_cal=i
    for tstname in acal_std_name.keys():
        if source_name[i] in acal_std_name[tstname]:
            modimage=CalModels + '/' + tstname + '_' + band + '.im'
            break
        #
        setjy(vis=outmsname,field=source_name[i],modimage=modimage)
else:
    for i in range(0, len(calcode)):
        if calcode[i]=='D' or calcode=='F' or calcode[i]=='I' or calcode[i]=='J' or calcode=='M':
            setjy(vis=outmsname,field=source_name[i],modimage='',fluxdensity=[1.0,0.0,0.0,0.0])
#
# Quick-n-dirty bandpass
#
for i in range(0, len(calcode)):
    if calcode[i]=='F' or calcode[i]=='I' or calcode=='K' or calcode=='M' or calcode=='N':
        print " Finding bandpass ..."
        band_cal=T
        #
        Bcaltable=ASDM_name + '.B1'
        Gcaltable=ASDM_name + '.G0'
        gaincal(vis=outmsname,caltable=Gcaltable,
                field=source_name[i],        
                solint='int',refant=refant,
                solnorm=T,calmode='p',minsnr=5)
        bandpass(vis=outmsname,caltable=Bcaltable,
                 field=source_name[i],
                 solint='inf',combine='scan',refant=refant,
                 solnorm=T,
                 gaintable=[Gcaltable])
        break
#
# Amplitude and phase calibration
#
print " Amplitude and phase calibration ..."

Gcaltable=ASDM_name + '.G1'
if band_cal==T:
    gaintable=[Bcaltable]
else:
    gaintable=[]

if flux_cal==T:
    gaincal(vis=outmsname,caltable=Gcaltable,
            field=source_name[iflux_cal],
            solint='inf',combine='scan',refant=refant,
            minsnr=5,
            gaintable=gaintable)

for i in range(0, len(calcode)):
    if calcode[i]=='D' or calcode[i]=='I' or calcode=='J':
        gaincal(vis=outmsname,caltable=Gcaltable,
                field=source_name[i],
                solint='inf',refant=refant,
                minsnr=3,
                append=T,
                gaintable=gaintable)
#
# Apply calibration.
#
if flux_cal==T:
    fluxtable=ASDM_name + '.flux1'
    for i in range(0, len(calcode)):
        if calcode[i]=='D' or calcode[i]=='I' or calcode=='J':
            fluxscale(vis=outmsname,caltable=Gcaltable,fluxtable=fluxtable,
                      reference=source_name[iflux_cal],
                      transfer=source_name[i])

if flux_cal==T and band_cal==T:
    gaintable=[Bcaltable,fluxtable]
    gainfield=['',source_name[iflux_cal]]
elif flux_cal==T and band_cal==F:
    gaintable=[fluxtable]
    gainfield=[source_name[iflux_cal]]
elif flux_cal==F and band_cal==T:
    gaintable=[Bcaltable,Gcaltable]
    gainfield=[]
else:
    gaintable=[Gcaltable]
    gainfield=[]

for i in range(0, len(calcode)):
    if calcode[i]=='D' or calcode[i]=='I' or calcode=='J':
        applycal(vis=outmsname,
                 field=''
                 gaintable=gaintable,
                 gainfield=gainfield,
                 calwt=F)
        break
#
# Form target source measurement set.
#
print " Forming target source measurement set ..."

for i in range(0, len(calcode)):
    if calcode[i]=='Z' or calcode[i]=='NONE':
        targetms=source_name[i] + '.ms'
        split(vis=outmsname,outputvis=targetms,
              datacolumn='corrected',
              field=source_name[i])
#
# Clip any egregious data.
#
Vstat=visstat(vis=outmsname,selectdata=F)
Vclip=clip_sigma*Vstat['DATA']['rms']

flagdata(vis=outmsname,
         mode='manualflag',
         clipexpr='ABS RR',clipminmax=[0.0,Vclip],
         selectdata=F)
flagdata(vis=outmsname,
         mode='manualflag',
         clipexpr='ABS LL',clipminmax=[0.0,Vclip],
         selectdata=F)

########################################################################
#
# Imaging and CLEANing.
#
print " Image the data ..."
clearstat()

imagename=target_name + '.im'
cells='%.3farcsec'%(synthesized_beam/4)

clean(vis=targetms,imagename=imagename,
      mode='mfs',
      niter=300,
      psfmode='clark',
      imagermode='csclean',
      imsize=[1024,1024],cell=cells,
      stokes='IV',
      weighting='briggs',robust=-1)