# Difference between revisions of "DataWeightsAndCombination"

(→Principles of Data Weighting) |
(→Principles of Data Weighting) |
||

Line 31: | Line 31: | ||

''t<sub>ij</sub>'' is the integration time per visibility. | ''t<sub>ij</sub>'' is the integration time per visibility. | ||

− | Note, when this equation is extended to a whole dataset rather than a single visibility, T<sub>sys,i</sub>T<sub>sys,j</sub> is replaced by the average ''T<sub>sys</sub><sup>2</sup>'' and t<sub>ij</sub> is replaced by the total time on source. The factors ''sqrt[N(N-1)]'' and ''√n<sub>p</sub>'' are included in the denominator, where N is the number of antennas and ''n<sub>p</sub>'' is the number of polarizations (1 for single pol and 2 for dual pol data). | + | Note, when this equation is extended to a whole dataset rather than a single visibility, T<sub>sys,i</sub>T<sub>sys,j</sub> is replaced by the average ''T<sub>sys</sub><sup>2</sup>'' and t<sub>ij</sub> is replaced by the total time on source. The factors ''sqrt[N(N-1)]'' |

+ | |||

+ | <math>\sqrt{N(N-1)}</math> and ''√n<sub>p</sub>'' are included in the denominator, where N is the number of antennas and ''n<sub>p</sub>'' is the number of polarizations (1 for single pol and 2 for dual pol data). | ||

'''In order to correctly combine and image data that have different T<sub>sys</sub>, Δν<sub>ch</sub>, t<sub>ij</sub>, or antenna size it is essential to use visibility weights proportional to 1/σ<sub>ij</sub><sup>2</sup>. The remainder of this guide deals with ensuring that the individual visibility weights are correct in your ALMA data.''' | '''In order to correctly combine and image data that have different T<sub>sys</sub>, Δν<sub>ch</sub>, t<sub>ij</sub>, or antenna size it is essential to use visibility weights proportional to 1/σ<sub>ij</sub><sup>2</sup>. The remainder of this guide deals with ensuring that the individual visibility weights are correct in your ALMA data.''' |

## Revision as of 10:21, 22 July 2015

This page is currently under construction.

## Contents

## Principles of Data Weighting

For optimal imaging performance it is critical that in a relative sense each visibility in the data have the correct weight -- that is, data with better sensitivity have more weight than data with less sensitivity. Formally, the
visibility weights should be equal to 1/σ_{ij}^{2} where σ_{ij} is the rms noise of a given visibility.

[math] \sigma_{ij}=\frac{2k}{\eta_{q}\eta_{c}A_{eff}} [/math] [math] \sqrt{\frac{T_{sys,i} T_{sys,j}}{\Delta\nu_{ch} t_{ij}}}, [/math]

where:

*k* is Boltzmann's constant.

*A _{eff}* is the effective antenna area which is equal to the aperture efficiency x the geometric area of the antenna. The aperture efficiency depends on the rms antenna surface accuracy.

*η _{q}* and

*η*are the quantization and correlator efficiencies, respectively. These have values near 1 and will be ignored for the purposes of this casaguide, but see the ALMA Technical Handbook for more information.

_{c}*T _{sys,i}* is the system temperature for antenna i, and

*T*is the system temperature for antenna j

_{sys,j}*Δν _{ch}* is the channel width.

*t _{ij}* is the integration time per visibility.

Note, when this equation is extended to a whole dataset rather than a single visibility, T_{sys,i}T_{sys,j} is replaced by the average *T _{sys}^{2}* and t

_{ij}is replaced by the total time on source. The factors

*sqrt[N(N-1)]*

[math]\sqrt{N(N-1)}[/math] and *√n _{p}* are included in the denominator, where N is the number of antennas and

*n*is the number of polarizations (1 for single pol and 2 for dual pol data).

_{p}**In order to correctly combine and image data that have different T _{sys}, Δν_{ch}, t_{ij}, or antenna size it is essential to use visibility weights proportional to 1/σ_{ij}^{2}. The remainder of this guide deals with ensuring that the individual visibility weights are correct in your ALMA data.**

Additionally, when combining data from different antenna configurations, one will get optimal overall sensitivity to all spatial scales by matching the surface brightness sensitivity at each uv-distance. This can only be achieved by having time-on-source per configuration in the right proportion. This topic is covered in ALMA Memo 598. This memo informs the relative amount of time that ALMA observes a project with the 7m-array versus the 12m-array, and compact versus extended 12m-array configurations. However, since telescope time is expensive, one typically does not actually observe in the optimal proportion, in that case one will not fully realize the expected "impact" of adding the less sensitive config data. In that case, one may chose to "upweight" the less sensitive config explicitly by changing its data weights, above and beyond 1/σ_{ij}^{2}. However, it should be noted that there are no free lunches, and such a change will come at the expense of overall image sensitivity though it may very well be the optimal choice for your science case. Finding the optimal up-weighting is usually a matter of experimentation, but can easily be explored using the visweightscale parameter in concat. As a general rule of thumb extra up-weighting by more than a factor of 2-3 is not recommended. Before messing with this however, its always best to start with data that simply has the correct 1/σ_{ij}^{2} weights.

## Weights in CASA

A memo describing weights in CASA, in particular the significant changes that were made with CASA 4.2.2, can be found at http://casa.nrao.edu/Memos/CASA-data-weights.pdf

To summarize the situation specifically as applied to ALMA data reduction in the ALMA archive and delivered to PIs:

**CASE 1) CASA 4.2.1 and earlier**: Weights were only scaled by 1/[(Tsys(i) * Tsys(j)] using calwt=True at the applycal stage for Tsys table. Assuming that (i) there aren't any unflagged antennas with significantly low gain due to for example pointing errors, or hardware problems, and (ii) all the antennas have similar efficiencies -- usually good assumptions for ALMA data, then to good approximation the data weights are close to internally consistent and can produce good imaging results, but should not be combined with other data that have different Δν_{ch}, t_{ij}, or antenna size without further modification.**CASE 2) CASA 4.2.2 and later**: Upon import data weights are scaled by 2Δν_{ch}Δt_{ij}and also scaled by 1/[(Tsys(i) * Tsys(j)] using calwt=True at the applycal stage for Tsys table. Additionally:**(A) For data calibrated by the 4.2.2 CASA Pipeline**the weights are further modified by [gain(i)^{2}* gain(j)^{2}] when the amplitude gain table is applied using calwt=True. Since the amplitude gains are directly proportional to the effective Antenna area, scaling the weights by the amplitude gains will take into account antenna size differences, and also down-weight antennas with comparatively low gain. Thus, these weights are completely correct.**(B) For data manually calibrated in CASA 4.2.2**, unfortunately calwt=False was still used to apply the antenna gain table, thus, these data have weights that are not correct in a relative sense when compared to other data with different antenna size by the factor [gain(i)^{2}* gain(j)^{2}].

**CASE 3) CASA 4.3 and later:**Data calibrated in either the pipeline or manually will have completely correct weights. An example of this situation is demonstrated in https://casaguides.nrao.edu/index.php/M100_Band3_Combine_4.3

## How Do I Know the Situation For My Data?

- Data taken in
**Cycle 0**and**Cycle 1**were reduced in 4.2.1 or earlier versions and correspond to Situation 1 above. ACA 7m-array data were first offered in Cycle 1. If you want to combine 12m-array and 7m-array data taken during Cycle 1, it is very likely you need to correct the visibility weights before imaging.

- The situation for
**Cycle 2**is more confused (this includes actual Cycle 2 projects and carry-over projects or parts of projects from Cycle 1). Recall from Case 2 above, that the key factor that determines your situation for Cycle 2 is whether the data were pipeline or manually calibrated.- Key dates:
- Start Cycle 2: June 1, 2014
- CASA 4.2.2 release date: Sept. 4, 2014
- Pipeline release date: Oct. 20, 2014
- The earlier in Cycle 2 your data was taken, the more likely it was manually calibrated, also if you have any very narrow spws, high frequency, etc it is more likely data were manually calibrated.

- How can I distinguish between pipeline and manual data reduction for Cycle 2 data?
- In your data delivery, the README file should say but there were some oversights on this front. A sure way to tell: look in the directory called
**script**within your data delivery for a particular member_ouss. If you see a file with PPR*.xml, the data was calibrated by the pipeline. Otherwise it was done manually.

- In your data delivery, the README file should say but there were some oversights on this front. A sure way to tell: look in the directory called

- Key dates:

## What Are the Options for Adjusting the Weights for Older Reductions?

If the data weights are not correct in the data you want to combine there are three options to correct the situation. These different methods carry different levels of pain/complexity depending on your situation. For example, for data manually calibrated in 4.2.2, Option 1 is pretty easy, but harder for older data/ versions of CASA. The situation can also be extra confusing if your data fall into multiple categories above. For example, it is not uncommon that in Cycle 2 the 12m-array data could be pipeline calibrated, but the 7m-array data done manually.

#### Option 1: Re-calibrate your data in CASA 4.2.2 or later

- ⇒ Option 1 is easiest to implement for Case 2B (Data observed in Cycle 2 manually calibrated in 4.2.2 or 4.3.1 with calwt=False for amplitude/flux gain table).

- All data imported in 4.2.2 (or later) will automatically adjust the weights by 2Δν
_{ch}Δt_{ij}. The Tsys application should also already be correct in your scripts. To correct for antenna size and weight by the gains you must change calwt=True for the amplitude table applycal.

- Where do I make the calwt change?
- As described in your README file, you can obtain a fully calibrated measurement set by executing the
**sciptForPI.py**. For manually calibrated data, within the**script**directory you will also find scripts with the format uid*scriptForCalibration.py, one for each execution for that Scheduling Block -- the sciptForPI.py actually executes these scripts, thus this is where any change must be made. - Toward the bottom of the uid*scriptForCalibration.py scripts (typically Step 17 or 18) you will find a step called
**# Application of the bandpass and gain cal tables**. Change the calwt=F in all of the applycal calls to calwt=T.

- As described in your README file, you can obtain a fully calibrated measurement set by executing the

**Caveat 1:**You must have the raw ALMA ASDM to run the**sciptForPI.py**(see the README file for more information).

**Caveat 2:**Most (all except early Cycle 0) ALMA manual calibration scripts have within them the CASA version used to create the script. For example, if your manual calibration script says the following, using any other version than 4.3.1 will give an error:

if re.search('^4.2.2', casadef.casa_version) == None: sys.exit('ERROR: PLEASE USE THE SAME VERSION OF CASA THAT YOU USED FOR GENERATING THE SCRIPT: 4.2.2')

- You must change the version number to match the version you want to use or the script will not run.

**Caveat 3:**Scripts from earlier than 4.2.1 are likely to have commands that make them incompatible to run directly in later versions of CASA. It may be difficult for a non-expert to update the uid*scriptForCalibration.py script(s) to current syntax.

#### Option 2: Make an approximate overall relative correction:

- ⇒ Option 2 is easy to implement for Cases 1 and 2B, but is not as accurate as Options 1 or 3. It is typically adequate for most purposes, and is MUCH better than doing nothing.

- All ALMA data reduced in CASA should already have 1/Tsys
^{2}weighting which accounts for many of the factors that make data from some baselines more sensitive than others within an individual dataset. If we assume that antennas with abnormally low gains (typically due to poor pointing, hardware failures) have already been flagged and all antennas meet the surface accuracy specifications, then the amplitude gains will be fairly constant from antenna to antenna, with the value dominated by the antenna size. Other parameters that can often be different between datasets include the Δν_{ch}, and t_{ij}. - Thus, a reasonably good approximation for the relative weight scaling for data_a compared to data_b is:
- Case 1: (antenna_size_b/antenna_size_a)
^{4}x (Δν_{ch}_b/Δν_{ch}_a) x (t_{ij}_b/t_{ij}_a) - Case 2B: (antenna_size_b/antenna_size_a)
^{4}

- Case 1: (antenna_size_b/antenna_size_a)
- such an overall scale factor can be applied during the concat step using the visweightscale parameter. An example is given below.

**Caveat 1:**This option requires that the datasets to be combined have only one (unflagged) antenna size per dataset. If this is not the case, then Options 1 or 3 must be used.

**Caveat 2:**This option also requires that both datasets to be combined are in the same weight situation (i.e. mixed Cases cannot be handled using simple overall relative scaling).

#### Option 3: Run the task statwt on your calibrated science target data

- ⇒ Option 3 is pretty easy to implement for both Cases 1 and 2B, and will account for (likely small) Gain variations from antenna to antenna within the individual datasets, whereas Option 2 does not. However, see Caveat 1, for very complex line emission it may be difficult to get a good result.

- This task attempts to assess the sensitivity per visibility and adjust the weights accordingly. It is very commonly used for JVLA data (including their pipeline).

**Caveat 1:**One must limit the calculation to line-free channels. For complex line projects this can be painful, however, typically the line-free channels are already known from the continuum subtraction, and can be reused here. However, it is best run statwt before continuum subtraction.

## Example of Correcting Weights from CASA 4.2.1 Calibrated Data Using Option 3

Below we show what could have been done to correct the M100 SV data if it was reduced in CASA 4.2.1 (rather than 4.3 as demonstrated in https://casaguides.nrao.edu/index.php?title=M100_Band3_Combine_4.3).

In CASA 4.2.1 and earlier, the data weights are 1 upon import, later in the standard calibration procedure, applycal scales the weights by 1/[(Tsys(i) * Tsys(j)] if calwt=True for the Tsys table applycal. As an example, we plot the weights of 7m and 12m data imported in CASA 4.2.1. No averaging can be turned on when plotting the weights.

```
# In CASA
os.system('rm -rf 7m_WT.png 12m_WT.png')
plotms(vis='m100_12m_CO.ms',yaxis='wt',xaxis='uvdist',spw='0~2:200',
coloraxis='spw',plotfile='12m_WT.png')
#
plotms(vis='m100_7m_CO.ms',yaxis='wt',xaxis='uvdist',spw='0~2:200',
coloraxis='spw',plotfile='7m_WT.png')
```

As you can see from these plots, the weights are quite similar at this stage because the data were taken under similar weather conditions and hence Tsys, and this is the only thing the weights have been scaled by so far.

Recall that the rms noise in a single channel for a single visibility is:

[math] \sigma_{ij}=\frac{2k}{A_{eff}} [/math] [math] \sqrt{\frac{T_{sys,i} T_{sys,j}}{\Delta\nu_{ch} t_{ij}}} [/math]

Beyond the obvious difference in the antenna dish sizes, looking at the listobs output for these two datasets, we see that the same channel width was used but that the integration time per visibility is 10.1 sec for the 7m-array and 6.05 sec for the 12m-array. Since dish area is in the denominator of
the radiometer equation and integration time per visibility is in the
denominator, and assuming WT propto 1/sigma^{2},
the **7m weight should be scaled by: (7./12.) ^{4} x (10.1/6.05) = 0.193**
to account for the difference in telescope size and integration time
per visibility.

An easy way to perform this overall scaling is using the visweightscale parameter in concat.

```
# In CASA
# Concat and scale weights
os.system('rm -rf M100_Intcombo_0.193.ms')
concat(vis=['m100_12m_CO.ms','m100_7m_CO.ms'],
concatvis='M100_Intcombo_0.193.ms',
visweightscale=[1,0.193])
```

Now plot the concatenated weights to verify they are as expected.

```
# In CASA
os.system('rm -rf Intcombo_0.193_WT.png')
plotms(vis='M100_Intcombo_0.193.ms',yaxis='wt',xaxis='uvdist',spw='0~2:200',
coloraxis='spw',plotfile='Intcombo_0.193_WT.png')
```

These combined data with correct relative weights are now ready for imaging.