2080.0 - Microdata: Australian Census Longitudinal Dataset, 2006-2011 Quality Declaration
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 18/12/2013
Page tools: Print Page Print All | |||||||||||||||||||||||||||||||||||||
This document was added or updated on 12/02/2016. SAMPLE METHODOLOGY
Overseas visitors are excluded for the 2006 ACLD sample. Visitors within Australia to private and non-private dwellings on Census Night are included. For more information on the scope and coverage of the Census: LINKING METHODOLOGY Data from the 2006 ACLD sample and the 2011 Census were brought together using data linkage techniques. The method involved linking without the use of name and address, as this information is destroyed at the end of each Census processing cycle. Data linkage is typically undertaken using probabilistic and/or deterministic methods, both of which were used in forming the ACLD:
Variables on the 2006 and 2011 Census files used for linking include:
A number of linkage passes were conducted based on different combinations of variables to ensure each record had the highest possible chance of being linked. At the end of the linkage process, 800,759 (82%) of the 979,661 sample records from 2006 were linked to a 2011 Census record. There were two reasons why some records from the 2006 Census were not linked to a 2011 record:
For detailed information on the linking methodology and an assessment of its quality see Australian Census Longitudinal Dataset, Methodology and Quality Assessment (cat. no. 2080.5). Variables relating to migrants from the Department of Social Services' Settlement Database have been included into the ACLD. These have been taken from an existing linkage between the Australian Census and Migrants Integrated Dataset. For information on the linking methodology of Settlement Database variables see Australian Census and Migrants Integrated Dataset Linking Methodology (cat. no. 3417.0.55.001). WEIGHTING, BENCHMARKING AND ESTIMATION Weighting Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, persons. The weight can be considered an indication of how many people in the relevant population are represented by each person in the sample. Weights were created for linked records in the ACLD to enable longitudinal population estimates to be produced. Cross-sectional population estimates for 2006 and 2011 are available from each Census. The ACLD began as a random sample of 5% of the Australian population in 2006. As such, each person in the sample should represent about 20 people in the population. Between Censuses, however, the in scope population changes as people die or move overseas. In addition, Census net undercount and data quality can affect the capacity to link equivalent records across waves. The ACLD weights benchmarked the linked records to the population that was in scope of both the 2006 and 2011 Censuses. The weights were based on four components: the design weight, undercoverage adjustment, missed link adjustment and population benchmarking. The original population benchmark was the 2011 Estimated Resident Population (ERP). The 2011 ERP was chosen over the 2006 ERP as the baseline population as it is more recent. The 2011 ERP was then adjusted so as to exclude people who were not in Australia in 2006 as depicted below. Weights were benchmarked to the following population groups:
At 12 February 2016 a new weight was applied to the ACLD file to better account for overseas departures and arrivals between 2006 and 2011. Users who have analysed the ACLD prior to 12 February 2016 may notice changes to estimates produced with the revised weight. Estimates of population groups will be different with the total weighted population estimate being 19.5 million compared to 18.6 million on the old weight. Proportions are expected to only show small differences when previous tables are compared. The weights have a mean value of 24 and range between 17 and 103. Higher weights are associated with people of Aboriginal and Torres Strait Islander origin and people who moved interstate between 2006 and 2011. Estimation Estimates of population groups are obtained by summing the weights of persons with the characteristic(s) of interest.SOURCES OF ERROR All reasonable attempts have been taken to ensure the accuracy of the results of the longitudinal dataset. Nevertheless potential sources of error including sampling, linking and census quality error should be kept in mind when interpreting the results. Sampling Error Sampling error occurs because only a small proportion of the total population is used to produce estimates that represent the whole population. Sampling error refers to the fact that for a given sample size, each sample will produce different results, which will usually not be equal to the population value. There are two common ways of reducing sampling error - increasing sample size and utilising an appropriate selection method (for example, multi-stage sampling would be appropriate for household surveys). Given the large sample size for the ACLD (1 in 20 persons), and simple random selection, sampling error is minimal. Linking Accuracy False links can occur during the linkage process as even when a record pair matches on all or most linking fields, it may not actually belong to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are nevertheless false. The nature of the process used for the ACLD linkage means that while the links obtained are to a high degree of accuracy, some false links may be present within the ACLD dataset. There is an estimated 5% -10% false link rate in the ACLD. For further detail on the accuracy of the linkage, see Australian Census Longitudinal Dataset, Methodology and Quality Assessment (cat. no. 2080.5). Managing Census Quality
Respondent error
Processing Error
Partial Response
Undercount
DATA CONSISTENCY A small percentage of linked records have inconsistent data, such as a different country of birth at the two time points or an age inconsistency of more than one year (when the expected five year difference is accounted for). Inconsistencies may be due to:
In most analysis, the effect of inconsistent information has a very small impact. Characteristics from either the 2006 or 2011 data can be used in tables and some exploration of consistency over time will assist in drawing appropriate conclusions. No data editing was applied to the file beyond that which had already taken place during the relevant Census processing period. A set of consistency flags has been included on the ACLD file so that inconsistent data may be observed, quantified or excluded from calculations. Consistency flags, located in the Longitudinal group of data items, have been created for Census variables that would not be expected to change over time or have unlikely transitions over time. These are as follows:
There are numerous ways to define consistency. The consistency flags have fine level categories to allow users flexibility in using their own definition of consistent or inconsistent. For example where one Census has 'not stated' for the year of arrival data item, a user can decide whether the record should be considered consistent or not. The same applies to where the response for one Census is 'not applicable'. The labels attached to each category suggesting consistency or inconsistency will assist the user in determining which records are consistent or inconsistent for their needs. See also Longitudinal Data Items in the Data Items chapter. INCONSISTENT REPORTING ON THE LINKED ACLD FILE, By selected characteristics
Document Selection These documents will be presented in a new window.
|