Page tools: Print Page Print All | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. LINKAGE RESULTS, 2006-2011 (ORIGINAL), 2006 PANEL TABLE 2 - LINKAGE RATES, By selected characteristics
The linkage rates that were achieved for the ACLD were relatively consistent across most sub-populations and were in line with expected results. Compared with the national average of 82%, the sub-populations which achieved the highest linkage rates were persons:
The subpopulations which achieved the lowest linkage rates were persons:
Traditionally, the Census Post Enumeration Survey (PES) has shown that the Census has higher rates of undercount for people of Aboriginal and/or Torres Strait Islander origin, those aged between 20 and 29 and for those in the Northern Territory. As expected, the lower ACLD linkage rates broadly aligned with the same groups that experience higher levels of undercount in the Census. One additional group that had lower linkage rates were persons aged 75 and over at the time of the 2006 Census who, due to age, had an increased risk of death over the ensuing five years. Further information on Census undercount can be found in Census of Population and Housing - Details of Undercount, 2011 (cat. no. 2940.0) Further data cubes, demonstrating the linkage rates for various sub-populations are available as an attachment to this Information paper. 3.1 LINKAGE ACCURACY The following quality measures were calculated for the ACLD and indicate a good level of overall quality:
3.1.1 Linkage Rates, True and False Links Not all record pairs assigned as links in a data linkage exercise are a match, that is, a record pair belonging to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are nevertheless false. The linkage strategy used for the ACLD was designed to achieve both a high number of links and to ensure a high level of accuracy to enable longitudinal research. Accordingly, the strategy was restrictive and conservative, especially in the early passes. Analysis from the results of clerical review was conducted to determine the quality of the linkage process and estimate the number of true links in the linked ACLD file. This process involved calculating the proportion of rejected record pairs at each linkage weight and determining the amount of false links this would represent in the final output file. Table 3 provides a summary from the results of clerical review, including an estimate of the number of false links accepted in each pass. Due to the nature of deterministic linking and the way in which linked records were retained, no false links were identified in passes 1 and 2. While it is assumed that all links assigned in these passes were true, as they contained consistent information across all key linking fields, in reality there may have been a small but un-quantifiable number of false links. TABLE 3 - LINKAGE RESULTS, By pass number
The combined clerical review results indicate that the number of false links in the final ACLD file could be as low as 5%. By including a tolerance around these results and assuming a small false link rate for the deterministic passes, the false link rate for the ACLD is estimated to be about 5-10%. The passes that contained the highest proportion of false links were Pass 9 (21.4%), where family information was used to try and resolve unlinked records, and Pass 5 (19.8%), which used a broad geography (SA4) as the blocking field. Whilst this is only an approximate estimate, it does give an indication of the high level of overall quality examined through reviewing a sample of over 2,500 record pairs. The linkage rate of 82% with a false link rate of 5% was broadly consistent with, or better than, other ABS Census linkage projects which did not use name and address as linkage variables (see Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (cat. no. 1351.0.55.026)). The conservative and restrictive nature of the blocking and linking strategy helped to minimise the number of estimated false links throughout the linkage process accompanied by quality controls that were implemented during clerical review. About two-thirds (68%) of all links were achieved in the first pass of the project, which used a deterministic linking methodology to identify and filter matches. In Pass 1, a tight geographic and demographic restriction was implemented to maximise the amount of high quality links assigned and to limit the amount of alternative comparisons required. Using this approach, links were only accepted if a single record pair was identified. 3.1.2 Consistency of Common Information on Record Pairs In data linkage projects, geographic boundaries function as blocking variables that restrict the search for record pairs. They are also used as linking variables, and when combined with other linking fields such as age, sex and date of birth, provide a high level of uniqueness, and reduce the likelihood of linking to an incorrect record. Table 4 displays the number of records that had consistent information and is grouped by the consistency of the record pairs across varying levels of geography. TABLE 4 - CONSISTENCY OF LINKED RECORDS, By geography and selected linking fields
Just over 97% of all records that were matched in the ACLD linkage process agreed on small to medium levels of geographic area combined with other key linking fields, such as age, sex and date of birth. While the number of consistent fields can give a strong indication of likely linkage quality, other factors should be taken into account, for example, the expected number of people in a geographic area that are likely to share a characteristic by chance. A tolerance of plus or minus two years was used at certain parts of the linkage process to cater for persons who may have understated their age in 2006 and overstated it in 2011 or vice versa. By contrast, record pairs may have inconsistent information and yet be a true link. Inconsistent information may be recorded for the same person in different Censuses due to a range of factors, including:
3.1.2.1 Consistent Reporting of Indigenous Status Consistency of Indigenous status is a special case, since the change in reporting over time is both a potential indicator of linkage quality, and is of analytical interest. Results from the 2011 Census observed an unexpected increase in persons who identified as being of Aboriginal and/or Torres Strait Islander origin. This was due, in part, to improvements in Census collection practices that resulted in a more complete enumeration of the Aboriginal and Torres Strait Islander population in 2011 than in 2006. In addition, a significant contributor to this increase, was a change in the propensity of people to identify as being of Aboriginal and/or Torres Strait Islander origin in 2011 compared with 2006 (see Census of Population and Housing: Understanding the Increase in Aboriginal and Torres Strait Islander Counts, 2006-2011 (cat. no. 2077.0)). While there was a group of people in the ACLD who were identified as non-Indigenous in 2006 and of Aboriginal and/or Torres Strait Islander origin in 2011, this group was relatively small and was counterbalanced by an almost equally sized group who reported the opposite. This pattern of change is different to that expected, given the increasing propensity of people to identify their Aboriginal and Torres Strait Islander origin observed at the aggregate level in the entire 2011 Census. Throughout the linkage process, Indigenous status was used as a blocking and linking variable. Whilst this would have only made a small contribution to the linkage weight, this may have increased the likelihood of assigning a link to a record pair that contained consistent information for Indigenous status. Record pairs that contained inconsistent information for Indigenous status still had a good chance of being linked, however, providing there was sufficient additional information available for linking. Differences in the reporting of Indigenous status between 2006 and 2011 on the ACLD may be due to a range of reasons. These include:
Table 5 shows the reporting of Indigenous status for the linked records on the ACLD, across the 2006 and 2011 Censuses. Further data cubes, demonstrating a more detailed breakdown, by remoteness areas, are provided as an attachment to this Information paper. TABLE 5 - CONSISTENCY OF INDIGENOUS STATUS FOR LINKED RECORDS, 2006 and 2011
3.2 CHARACTERISTICS OF LINKED AND UNLINKED 2006 CENSUS SAMPLE Table 6 shows the distribution of key populations across the 2006 Census, the 2006 Census sample and the ACLD. TABLE 6 - SELECTED CHARACTERISTICS, By 2006 Census, 2006 Census sample and ACLD
The distribution of the ACLD file by sub-population was generally well aligned with both the 2006 Census sample and the entire 2006 Census. When looking at the relative difference between these proportions, however, some differences are more clearly observed. Compared with the entire 2006 Census, the linked ACLD contains relatively more records for people aged 0-9 years, and to a lesser extent those aged 40-49 years, 50-59 years and 60-69 years. By contrast, the ACLD contains relatively fewer records for people aged 20-29 years and 80 years and over. There is also relatively fewer people of Aboriginal and Torres Strait Islander origin in the ACLD, than the entire 2006 Census (1.8% compared with 2.3%). The corresponding weighted estimate, however, represents 3.0% of the total population, which is attributed to benchmarking the 2006 sample to the Aboriginal and Torres Strait Islander population in 2011 and therefore to the higher level of identification observed in the 2011 Census than in 2006 (see section 3.4). In general, the distribution of weighted counts for the linked ACLD file is close to that of the entire 2006 Census, but it is not designed to produce counts corresponding to the population in 2006. Rather, the weighted population is that of people who were in scope of both the 2006 and 2011 Censuses (see section 3.4). Thus, for example, the lower proportion of older people in the linked file, even after weighting, reflects that impact of deaths on the 2006 sample that occurred between 2006 and 2011. Further data cubes, demonstrating more detailed population distributions, are provided as an attachment to this Information paper. 3.3 REASONS FOR UNLINKED RECORDS There are two main reasons why records from the 2006 Census sample were not linked to a 2011 Census record:
3.3.1 Missing and/or Inconsistent Information In these cases, the true match was present in the pool of all record pairs but it was not identified because there was a high level of inconsistency between information on the 2006 Census sample and the 2011 Census record, or key linking fields were missing altogether. The reasons for the match being missed can be categorised into the following groups:
Accurate address coding was crucial in narrowing the search and differentiating between true and false links. It was a particular challenge for persons who had moved, since linkage was then dependent on the information supplied in 2011 about the person's address in 2006. Processing for the 2011 Census involved coding for address five years ago to a fine level of geography, ideally Mesh block. This was not always possible, either due to the insufficient detail of address information supplied or because by 2011, Census respondents may not have accurately remembered their address on Census Night in 2006. 3.3.2 No 2011 Census Record A person included in the 2006 Census sample may have had no equivalent 2011 Census record because they were no longer in scope for the Census due to migration from Australia, or death between 2006 and 2011, or they may simply have been missed in the Census. According to mortality data compiled by the ABS from data supplied by the Registrars of Births, Deaths and Marriages, about 700,000 people died in Australia between 2006 and 2011. If 5% of these people were represented in the 2006 sample, then it could be expected that up to 35,000 people could not have been linked due to death between 2006 and 2011. Similarly, migration data shows that just over one million people left Australia as permanent emigrants over the same period, potentially resulting in up to 50,000 people from 2006 Census sample being unlikely to have a corresponding 2011 Census record. Due to the size and complexity of the Census, it is inevitable that some people are missed and some are counted more than once. It is for this reason that the Census Post Enumeration Survey (PES) is run shortly after each Census, to provide an independent measure of Census coverage. The PES determines how many people should have been counted in the Census, how many were missed (undercount), and how many were counted more than once (overcount). It also provides information on the characteristics of those in the population who have been missed or overcounted. When taking into account all of these factors, it is estimated that over half of the unlinked 2006 Census sample (100,000 out of the 180,000 unlinked records) would not have a corresponding record in the 2011 Census. This would indicate that the initial linkage rate of 82% could be representative of up to 91% of the population that actually had an opportunity to be linked. 3.4 WEIGHTING Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, persons. The weight can be considered an indication of how many people in the relevant population are represented by each person in the sample. Weights were created for linked records in the ACLD to enable longitudinal population estimates to be produced. Cross-sectional population estimates for 2006 and 2011 are available from each Census. The ACLD began as a random sample of 5% of the Australian population in 2006. As such, each person in the sample should represent about 20 people in the population. Between Censuses, however, the in scope population changes as people die or move overseas. In addition, Census net undercount and data quality can affect the capacity to link equivalent records across waves. The ACLD weighting process, benchmarked the linked ACLD records to the population that was in scope of both the 2006 and 2011 Censuses. The weights were based on four components: the design weight, undercoverage adjustment, missed link adjustment and population benchmarking. The original population benchmark was the 2011 Estimated Resident Population (ERP). The 2011 ERP was chosen over the 2006 ERP as the baseline population as it is more recent. The ERP was than adjusted to exclude births and overseas arrivals that had occurred between 2006 and 2011. Weights were benchmarked to the following population groups:
The weights have a mean value of 24 and range between 17 and 103. Higher weights are associated with people of Aboriginal and Torres Strait Islander origin and people who moved interstate between 2006 and 2011. For more information see the Appendix. Document Selection These documents will be presented in a new window.
|