Overview of replication methods
ABS household surveys employ complex sample designs and weighting which require special methods for estimating the variance of survey statistics. Variance estimators for a simple random sample are not appropriate for this survey microdata.
A class of techniques called 'replication methods' provide a general process for estimating variance for the types of complex sample designs and weighting procedures employed in ABS household surveys. The ABS uses a method called the Group Jackknife Replication Method.
A basic idea behind the replication approach is to split the sample into G replicate groups. One replicate group is then dropped from the file and a new set of weights is produced for the remaining sample. This is repeated for all G replicate groups to provide G sets of replicate weights. For each set of replicate weights, the statistic of interest is recalculated and the variance of the full sample statistic is estimated using the variability among the replicate statistics.
The statistics calculated from these replicates are called replicate estimates. Replicate weights provided on the microdata file enable variance of survey statistics, such as means and medians, to be calculated relatively simply (Further technical explanation can be found in Section 4 of Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee).
How to use replicate weights
To calculate the standard error of any statistic derived from the survey data, the method is as follows:
- Calculate the estimate of the statistic of interest using the main weight
- Repeat the calculation above for each replicate weight, substituting the replicate weight for the main weight and creating G replicate estimates. In the example where there are 60 replicate weights, you will have 60 replicate estimates.
- Use the outputs from step 1 and 2 as inputs to the formula below to calculate the estimate of the Standard Error (SE) for the statistic of interest.
\(SE(y) = \sqrt{\frac{G-1}{G} \sum_{g=1}^G (y_{(g)} - y)^2}\)
[Equation 1]
- G = Number of replicate groups
- g = the replicate group number
- \(y_{(g)}\) = Replicate estimate for group g, i.e. the estimate of y calculated using the replicate weight for g
- y = the weighted estimate of y from the sample
From the replicate variance you can then derive the following measures of sampling error: relative standard error (RSE), or margin of error (MOE) of the estimate.
\(\text{Relative Standard Error (RSE)} = \frac{\text{SE}}{\text{Estimate}}\)
[Equation 2]
\(\text{Margin of Error (MoE)} = 1.96 \times \text{SE}\)
[Equation 3]
An example in calculating the SE for an estimate of the mean
Suppose you are calculating the mean value of earnings, y, in a sample. Using the main weight produces an estimate of $500.
You have 5 sets of Group Jackknife replicate weights and using these weights (instead of the main weight) you calculate 5 replicate estimates of $510, $490, $505, $503, $498 respectively.
To calculate the standard error of the estimate you will substitute the following inputs to equation [1]
- G = 5
- y = 500
- g = 1, y(g) = 510
- g = 2, y(g) = 490
- …
\(SE(y) = \sqrt{\frac{5-1}{5} \sum_{g=1}^5 (y_{(g)}- 500)^2}\)
\(SE(y) = \sqrt{\frac{4}{5}((510-500)^2+(490-500)^2+(505-500)^2+(503-500)^2+(498-500)^2)}\)
\(SE(y)= \sqrt{\frac{4}{5} \times 238}\)
\(SE(y)= 13.8\)
To calculate the RSE you divide the SE by the estimate of y ($500) and multiply by 100 to get a %
\(RSE(y)=\frac{13.8}{500} \times 100\)
\(RSE(y)=2.8\%\)
To calculate the margin of error you multiply the SE by 1.96
\(\text {Margin of Error} (y)=13.8 \times 1.96\)
\(\text {Margin of Error} (y)=27.05\)