MG0414+0534 P-band Spectral Line Tutorial - CASA 5.0.0
This CASA Guide is for Version 5.0.0 of CASA. THIS CASA GUIDE IS CURRENTLY IN PREPARATION (LAST ACTIVITY 18 AUG 2017)
This tutorial describes how to use CASA 5.0.0 to reduce spectral-line data in the low-frequency P-band of the VLA (230 – 470 MHz). The goal is to make an image cube containing HI 21cm absorption against the strong radio continuum of gravitationally lensed radio galaxy MG0414+0534. As a results of the high redshift of z=2.6365, the HI absorption signal in MG0414+0534 is redshifted to an observed frequency of 390.597 GHz. The HI absorption in MG0414+0534 was previously imaged with the VLA by Moore, Carilli & Menten 1999 (ApJ, 510, 87), before the upgrade to the WIDAR system.
To perform P-band spectrsocopy, there are three important considerations for planning the observations:
- Use a bandpass calibrator that is strong enough to accurately calibrate the frequency-dependent gain variations. This is particularly important for most HI 21cm absorption projects, which are typically performed against strong radio continuum sources. As a rule-of-thumb, use tcal > tobj × (Sobj / Scal)2, with “t” the exposure time and “S” the source flux density. This requirement is to avoid introducing excessive additional noise in the spectrum. In addition, if very high spectral dynamic range is needed (i.e., when expecting a ratio between the detection limit and the radio continuum of about 1:10,000 or more), consider observing a bandpass calibrator several times during your run to be able to correct for time-varying bandpass changes, which scale with the continuum emission in the target field. See this page for more information.
- Use a bandwidth that is wide enough to perform accurate self calibration. Using a wide bandwidth for self-calibration this is important for fields with relatively weak continuum sources. For strong continuum sources, a narrower bandwidth can be used to avoid excessive RFI.
The P-band test-data on MG0414+0534 that we use in this tutorial were obtained using a large bandwidth. This to ensure that good bandpass solutions could be obtained, and that self-calibration could accurately be performed. However, due to the strong radio continuum of MG0414+0534 (3.3 Jy at 390m MHz) and the large amounts of RFI across the entire band, we only use a small fraction of the total band for data reduction and analysis in this tutorial.
Obtaining the raw data
We will use test-data that was taken in a hybrid configuration when the VLA array was moved from B-config to A-config. The data set can be downloaded from the NRAO archive by searching for the following Project Code:
This returns a long list with test-data that are publicly available. Our observations were performed on 14 Sept 2016:
Note that this observation was duplicated on 15 September 2016 . This second day of observations can be found under filename: TSUB0001.sb32720781.eb32773507.57646.24443702547
For the purpose of this tutorial, we only reduce and image the first data set. The second data set can be reduced in an identical way, pending the flagging, as the RFI conditions most likely changed. During the imaging stage you can then combine both reduced data sets.
To download the data, fill in your email, select either the "SDM-BDF (all files)" or “MS” option, and check the box next to the data sets that you want. The practical difference between selecting the MeasurementSet (MS) and Science Data Model (SDM) is that for MS the first step in the data reduction described below, namely reading in the SDM data to save it as a MeasurementSet, is already done by the archival engine. Note that you can also opt to download the data as a tar-file by clicking the appropriate box.
Optionally, you can request to discard data marked as ‘bad’ by clicking the box “Apply Telescope Flags”, which gets rid of data taken during times of instrument calibration, shadowing, slewing, etc. However, it also applies all flags on the science data that were automatically created during the observations. Therefore, a safer option is to not apply any flags before downloading the data, but first inspect and subsequently apply the flags generated during the observations using “flagcmd”, as part of the data reduction plan.
Click "Get My Data" will forward you to the next page, where you should choose the delivery method (either downloading over the internet, or sending home a hard-drive with the data). If you opt to retrieve the data over the internet, wait until you get an email confirming that the data is ready for download.
Loading data into CASA
Start CASA by typing
on the command line. This should start a CASA interactive python (iPython) session, and open a separate log window. To guarantee that the below mentioned procedure for data reduction and imaging works, make sure you are using CASA version 5.0.0. While older version may work as well for the purpose of this tutorial, it is good to visit this page for instruction on download and installation of the latest version of CASA. We will begin by importing our data from the binary format (SDM-BDF) into the MeasurementSet format, which is the standard for CASA data. For this, we use importevla:
# In CASA importevla(asdm='TSUB0001.sb32720781.eb32763188.57645.263958564814',vis='MG0414_d1.ms',flagpol=False,applyflags=False,savecmds=True,outfile='flagfile.txt')
In this case, we do not apply the flags that were created as part of the observations, but we will write them out to a flagfile.txt file. We will inspect and apply the flags as follows:
# In CASA flagcmd(vis='MG0414_d1.ms',action='plot') flagcmd(vis='MG0414_d1.ms',action='apply')
As an alternative to flagcmd, because we saved the flags to flagfile.txt, we can also use
# In CASA flagdata(vis='MG0414_d1.ms',mode='list',inpfile='flagfile.txt',action='apply')
to apply the flags.
One of the flag commands is to clip all data points that have a 0-value, as these most likely have not recorded any science data. This is normally done with the settings correlation=`ABS_RR’ and correlation=`ABS_LL’. Because the VLA P-band and 4-band systems use linear XX and YY polarisation, rather than the circular RR and LL polarisation of the other VLA bands, you will encounter an error message when running flagcmd. We can get around this issue by manually clipping the 0-data for the XX and LL polarization with flagdata:
# In CASA flagdata(vis='MG0414_d1.ms',mode='clip',correlation='ABS_XX,ABS_YY',clipzeros=True,action='apply')
Note: It is good practice to also carefully read the Operator Log that was created by the operator on duty during the observations. This can provide additional information on data that should be flagged manually during the data reduction stage. These Operator logs can be found here.
Now that the data is read in and the first flagging is performed, we will inspect the content of our data set using listobs
# In CASA listobs(vis='MG0414_d1.ms',listfile='listobs.txt')
This normally plots an overview of the data in the CASA logger, but with the commmand listfile=`listobs.txt’ this information is written out to the file listobs.txt.
Initial flagging: getting clean calibrator data
When plotting the data in plotms (plotting amplitude against frequency), it is immediately clear that our observing band contains lots of Radio Frequency Interference (RFI), which is unavoidable at these low frequencies. The situation actually looks worse than it is, because ‘Gibbs ringing’ causes the blending of strong RFI signal into adjacent channels (Fig.1). To reduce the effect of Gibbs ringing, we first Hanning smooth the data (please note that this can take a while to run on the entire 142 Gb data set):
# In CASA hanningsmooth(vis='MG0414_d1.ms',datacolumn='data',outputvis='MG0414_d1_hanning.ms')
Upon further inspection in “plotms” , it is clear that there is a lot of RFI across most of the 64 GHz (or 43,000 km/s) band that we used for the observations. Luckily, the region around the expected HI absorption line at 390.6 GHz is relatively clean of RFI. Because our target is strong enough in the continuum to perform a good self-calibration, we select this relatively clean part of the band centered around the expected HI line. Note that there were two IFs with different spectral resolution included in the observations. We need only one of the two IFs, and choose the one with the highest spectral resolution. Moreover, we only need the sources 3C48 (which we will use for bandpass, gain and flux calibration) and our target MG0414+0534, hence we select field=’3,4’. This approach has the additional advantage that we reduce our data volume a lot, bringing it down to about 15 GHz.
# In CASA split(vis='MG0414_d1_hanning.ms',outputvis='MG0414_d1_data.ms',datacolumn='data',field='3,4',spw='17:20~485',keepflags=False)
We now have to perform a more in-depth flagging our bad data. Our strategy is to first focus on the calibrator source (in this case only 3C48). If the calibrator data are flagged well, we should be able to perform an accurate calibration of the target data. Later in this document we will then flag the calibrated target-data using an automated flagger.
When plotting frequency against aplitude in plotms, we see that there are still two ranges of channels filled with RFI throughout the observing run. We will flag those data with the task “flagdata”. Note that, because we are flagging channels on all scans of the bandpass calibrator 3C48, these channels cannot be calibrated accurately, so we flag these channels also in the data of the target!
# In CASA flagdata(vis='MG0414_d1_data.ms',mode='manual',spw='0:148~155',action='apply') flagdata(vis='MG0414_d1_data.ms',mode='manual',spw='0:51~59',action='apply')
The data on our bandpas+gain+flux calibrator 3C48 now look clean, with no dead or misbehaving antennas. We therefore now start with the calibration. Many of the steps below are based on the CASA tutorial on reducing VLA P-band continuum data.
We start the calibration by obtaining the latest set of antenna-position corrections compared to the stored values that were derived at the start of the observing period. For this you need to run gencal with caltype=`antpos’ while being connected to the internet:
# In CASA gencal(vis='MG0414_d1_data.ms',caltype='antpos',caltable='antpos.cal')
The output is a list of antenna position corrections that are written to the calibration table antpos.cal:
2017-07-25 19:54:51 INFO gencal Determine antenna position offests from the baseline correction database 2017-07-25 19:54:52 INFO gencal offsets for antenna ea01 : 0.00200 0.00000 0.00200 2017-07-25 19:54:52 INFO gencal offsets for antenna ea02 : 0.00210 0.01800 0.00430 2017-07-25 19:54:52 INFO gencal offsets for antenna ea03 : -0.00700 0.01600 -0.02000 2017-07-25 19:54:52 INFO gencal offsets for antenna ea06 : 0.00200 -0.00200 0.00500 2017-07-25 19:54:52 INFO gencal offsets for antenna ea08 : -0.01290 0.01280 0.02920 2017-07-25 19:54:52 INFO gencal offsets for antenna ea09 : -0.01200 -0.01300 0.02400 2017-07-25 19:54:52 INFO gencal offsets for antenna ea15 : -0.00500 -0.01000 0.01100 2017-07-25 19:54:52 INFO gencal offsets for antenna ea16 : 0.01100 -0.01500 -0.01200 2017-07-25 19:54:52 INFO gencal offsets for antenna ea17 : -0.01300 0.00000 0.05000 2017-07-25 19:54:52 INFO gencal offsets for antenna ea18 : -0.00600 0.02200 -0.02000 2017-07-25 19:54:52 INFO gencal offsets for antenna ea22 : 0.00600 0.00000 0.00000 2017-07-25 19:54:52 INFO gencal offsets for antenna ea25 : -0.00140 0.00140 -0.00140 2017-07-25 19:54:52 INFO gencal offsets for antenna ea27 : -0.00400 -0.00500 0.03100 2017-07-25 19:54:52 INFO calibrater Beginning specifycal----------------------- 2017-07-25 19:54:52 INFO Creating KAntPos Jones table from specified parameters. 2017-07-25 19:54:52 WARN NB: This EVLA dataset appears to fall within the period 2017-07-25 19:54:52 WARN + of semester 16B during which the online tropospheric 2017-07-25 19:54:52 WARN + delay model was mis-applied. 2017-07-25 19:54:52 WARN A correction for the online tropospheric delay model error WILL BE APPLIED! 2017-07-25 19:54:52 WARN Marking antpos caltable to turn ON the trop delay correction.
Note: these observations were taken during a period in which the atmospheric delay terms were calculated incorrectly. In CASA 5.0.0 (and in fact in CASA versions 4.7.1 and up), a for details correction for this is taken into account automatically when running gencal with caltype=`antpos’.
Next we correct for ionospheric effects, which are important at frequencies below 5 GHz. For this, you need to be online. CASA’s strategy is to obtain information on the total electron content (TEC) for the date of observations. This information is based on data from the global navigation satellite system (GNSS). A series of CASA images, in the form of a 24h movie of the TEC as function of longitude and latitude, is then generated and stored as filename.IGS_TEC.im, with a corresponding TEC error movie named filename.IGS_RMS_TEC./im (these movies can be viewed in CASA’s viewer):
# In CASA from recipes import tec_maps tec_image, tec_rms_image = tec_maps.create(vis='MG0414_d1_data.ms',doplot=True)
Note that the information regarding the TEC is sparse, and either active or direction-dependent ionospheric conditions may not be corrected very well. Also, the online TEC information improves with time, hence the quality of these TEC-data is best about two weeks after the observations.
Subsequently, an ionosphere correction table is generated using gencal, in which projected line-of-sight TEC (which depends on the zenith angle) is sampled for all times in the observation and stored in a standard CASA caltable. Fig.3 shows a plot that is generated for this caltable. Note: if you previously ran plotants, this TAC plot is not generated. To get around this issue, exit CASA before calculating the ionospheric effects.
See the CASA Cookbook for a detailed description on ionosphere corrections in CASA (thanks are due to Jason Kooi, UIowa for his contributions).
Next we will calibrate the requantizer gain levels, which are the visibility amplitudes that were set as the input of the WIDAR correlator. Requantizer scans are added to a P-band observing schedule to optimize the digital power in each spectral window, which maximizes the signal-to-noise in each window. Because there is a significant variation in power across the 240 MHz of the P-band, and because some spectral windows may experience a high power due to strong RFI, setting the requantizer levels during the observations may improve the quality of the data, including the shape of the bandpass across multiple spectral windows. As part of the test-observations of MG0414+0534, the very strong source Cygnus-A was observed at the start of the run. Re-setting the requantizer levels both before observing Cygnus-A and before targeting the other (much weaker) sources was essential to optimize the digital power and avoid correlation errors. Although we do not use Cygnus-A as part of this tutorial, and we use only a single spectral window, performing calibration of the requantizer gains is still good practice for P-band spectral line reduction:
# In CASA gencal(vis='MG0414_d1_data.ms',caltype='rq',caltable='rq.cal')
The next step is to set the absolute flux levels of our flux calibrator 3C48, for which a well-known model exists:
# In CASA setjy(vis='MG0414_d1_data.ms',field='0',scalebychan=True,standard='Scaife-Heald 2012',listmodels=False,usescratch=False)
Note: it is crucial for spectral-line work to set scalebychan=True, to ensure that the absolute flux level is calculated per channel and correctly interpolated across the observing band. If scalebychan=False, then only a single value per spectral window is calculated, resulting in a step-function in flux between spectral windows.
We then determine the delay of each antenna, for each polarization and each spectral window. Doing this on a short scan of the primary calibrator is generally sufficient, and will correct for internal (e.g., electronics, cables) and external (e.g., ionosphere) effects. We choose ea04 as reference antenna, as it produced good data, has baselines to all other used antennas in the array, and is located in the center of the array. We also need to apply ‘on the fly’ the calibration tables for the antenna positions, the total electron content and the requantizer levels.
# In CASA gaincal(vis='MG0414_d1_data.ms',caltable='delays.cal',field='0',selectdata=True,timerange='06:43:48~06:48:42',solint='inf',refant='ea04',gaintype='K',gaintable=['antpos.cal','tecim.cal','rq.cal'])
The caliberation solutions can be plotted using:
# In CASA plotcal(vis='MG0414_d1_data.ms',xaxis=’antenna’,yaxis=’delay’, field=’0’)
Note that any delays deviating by more than 30 nm should be treated with caution, and if needed, flagged.
Next step is to calibrate the bandpass, which corrects for the frequency dependent gain variations. Before determining the frequency-dependent gains, we first calibrate the phases in time, deriving a solution for each integration, to make sure we that we can derive accurate bandpass solution when integrating over a full scan on the bandpass calibrator 3C48:
# In CASA gaincal(vis='MG0414_d1_data.ms',caltable='bpphase.gcal', field='0',spw='0:250~300',refant='ea04',calmode='p',solint='int',minsnr=2.0,gaintype='G',gaintable=['antpos.cal','tecim.cal','rq.cal','delays.cal'])
We use these phase-solutions to perform the actual bandpass calibration. In this case, we have scans of our bandpass calibrator 3C48 taken at three different times during the observations. By interpolating the bandpass solutions in time, we try to minimize any possible time-varying bandpass effects and improve our spectral dynamic range.
# In CASA bandpass(vis='MG0414_d1_data.ms',caltable='bandpass.cal',field='0',spw='0',solint='inf',combine='',refant='ea04',solnorm=False,bandtype='B',gaintable=['antpos.cal','tecim.cal','rq.cal','delays.cal','bpphase.gcal'])
Note: To interpolate between bandpass scans, it is essential to specify “ combine=`’ ”. The reason is that the default is “ combine=’scan’ ”, which combined with the parameter “solint=`inf’ ”, would cross the scan boundaries to form one single solution for the bandpass. Only when combining “solint=`inf’ ” with “ combine=`’ ”, a bandpass solution is obtained for each individual scan on 3C48.
After the bandpass calibration, we perform a phase calibration of our data. Because we will apply the bandpass solutions on-the-fly, we can use the full bandwidth to determine the phase solutions. We will create two different calibration tables to correct for the time-varying phases. First, we calibrate the phases on each interval with the option solint=’int’.
# In CASA gaincal(vis='MG0414_d1_data.ms',caltable='intphase.gcal',field='0',spw='0',refant='ea04',calmode='p',solint='int',minsnr=2.0,gaintable=['antpos.cal','tecim.cal','rq.cal','delays.cal','bandpass.cal'])
The sole purpose of this generating phase solutions on each 10s interval is to be able to accurately correct for the time-varying amplitudes, as we will see below. However, this method is not ideal for interpolating the phase solutions in time. For that, we do a second phase calibration, this time averaging the signal across the phase-calibrator scans with solint=’inf’:
# In CASA gaincal(vis='MG0414_d1_data.ms',caltable='scanphase.gcal',field='0',spw='0',refant='ea04',calmode='p',solint='inf',combine='',minsnr=2.0,gaintable=['antpos.cal','tecim.cal','rq.cal','delays.cal','bandpass.cal'])
Note that an alternative approach to deriving phase solutions per scan is to use the task ‘smoothcal’ to average the phase-corrections obtained every 10s with solin=’int’ over the full duration of a single scan.