Adjusting Longitudinal Datasets: Analysis of the Simulated SLCD
The Statistical Longitudinal Census Dataset (SLCD) aims to link a 5% sample of Census 2006 records to their 2011 Census counterparts using Census fields with the notable omission of names and addresses. In preparation, a Simulated SLCD was conducted, linking 2005 Census Dress Rehearsal (CDR) records to 2006 Census records.
As some records inevitably contain missing or incorrect fields, and some fields legitimately change over time, a proportion of records will be linked incorrectly or will remain unlinked. The simulated SLCD offered a chance to assess the impact of errors in linking, and to explore adjustment methods which may be employed to minimise their impact on analysis. As it was conducted during the Census processing period, names and addresses were available for use in creating a linked Simulated SLCD. This additional linking information was used to create a high quality Gold Standard (GS) linked dataset. The Bronze Standard (BS), which was created without names and addresses, better resembles the linking process planned for the SLCD, and was evaluated against the GS. The GS is set to be deleted in December 2009, now that this evaluation has been completed.
The BS was compared against the GS, but additionally the adjustment techniques were applied to the BS and evaluated. The BS was weighted using two methods. First by benchmarking to the CDR; and second by modelling the probability a record linked on the GS is linked on the BS and using the inverse of these probabilities as weights. Weighting aimed to correct primarily for bias introduced by records which were unlinked on the BS. Additionally, adjustment techniques were developed, aiming to correct for both incorrect links and unlinked records. These methods were based on maximum likelihood techniques, and again used the GS to estimate the probability of being linked or linked correctly.
Evaluating both the unadjusted BS and adjustment approaches consisted of running analyses to answer example research questions of interest. Results suggest that the BS yields the same conclusions as the GS for most analyses. However, the BS was inadequate for analysis of people living in indigenous communities, as it performed poorly in linking these individuals. Weighting tended not to improve the BS, however moderate (and in some cases large) improvements were gained using the newly developed adjustment techniques.
Looking forward, plans are underway for a second simulated SLCD, linking the 2010 CDR with the 2011 Census, before the SLCD is conducted. Properties of the simulated SLCD BS need to be carefully considered when drawing inferences about the SLCD. The linking methodology will differ in the SLCD, and linking files five years apart will produce challenges not present in linking files one year apart. An information paper describing weighting is currently being written, which will describe the options for weighting the SLCD.
For more information please contact Paul Campbell on (02) 6252 7101 or paul.campbell@abs.gov.au, or James Chipperfield on (02) 6252 7301 or james.chipperfield@abs.gov.au.