2085.0 - Microdata: Australian Census Longitudinal Dataset with Social Security and Related Information, experimental statistics, 2006-2011 Quality Declaration 
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 23/08/2017  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All

SCOPE AND COVERAGE
LINKING METHODOLOGY
WEIGHTING, BENCHMARKING AND ESTIMATION
SOURCES OF ERROR
DATA CONSISTENCY


SCOPE AND COVERAGE

The ACLD is a random 5% sample of persons enumerated in Australia on Census Night, 8 August 2006 which has been linked using statistical techniques to records from the 2011 Census, conducted on 9 August 2011. The Census covers all areas in Australia and includes persons living in both private and non-private dwellings, but excludes:

    • diplomatic personnel of overseas governments and their families;
    • and Australian residents overseas on Census Night.

Overseas visitors to Australia are excluded from the 2006 ACLD sample while persons within Australia who were away from their place of usual residence on Census Night are included.

For more information on the scope and coverage of the Census:
Additional migrant-related data

There are 18,859 records on the ACLD that have additional data attached resulting from the linkage of 2011 Census data to the Department of Social Services’ Settlement Database (SDB). These records correspond to people who had a permanent visa record on the Settlement Database with a date of arrival between January 1, 2000 and August 8, 2006 (that is, Census Night in 2006) and were able to be linked to the ACLD.

The SDB date of arrival on which the scope is based reflects an individual's latest arrival pertaining to their latest permanent visa. For an offshore applicant, the SDB arrival date is when the applicant arrives in Australia on that permanent visa. However, for a person who applies onshore for a permanent visa, the date of arrival listed on the SDB is the date of their last entry into Australia.

For further information about coverage issues, please see Research Paper: Assessing the Quality of Linking Migrant Settlement Records to 2011 Census Data (cat. no. 1351.0.55.043).

Additional social security related data

There are 214,713 records on the ACLD-SSRI that have additional data attached resulting from the linkage of the ACLD to a subset of the Department of Social Services’ Social Security and Related Information (SSRI) dataset. This represents those persons who received social security benefits or had suspended benefits as at September 2011 and where their social security record was able to be linked to the ACLD. Data from this period was selected to best align with the 2011 Census. For more detail on the specific benefits available at the time, including benefit eligibility criteria see A Guide to Australian Government Payments produced by the Department of Human Services for September 2011.

Records with ACLD, migrants and social security related data

There are 4,038 records that have both migrant-related data and SSRI data alongside the Census information.

The population these 4,038 records represent corresponds to people who were:
    • residents of Australia at the time of and participated in both the 2006 and 2011 Censuses;
    • recent migrants of Australia who were granted a permanent visa between January 1, 2000 and August 8, 2006 (that is, Census night in 2006);
    • were a social security recipient in September 2011 (that is, they were receiving social security benefits or had suspended benefits as of the end of September); and
    • were able to have both their migrant-related record and their social security record linked to the ACLD.

The records with both migrants and SSRI data are a result of two separate and independent linkage exercises that were performed to create the 2016/2017 ACLD refreshes:
    • Linkage of 2011 Census records to DSS Settlement Database records; and
    • Linkage of ACLD 2011 Census records to DSS SSRI dataset records.

The ACLD contains migrant information for a subset of the records, that is, those 2011 Census records that were linked to a record in the DSS Settlement Database as a result of linking the 2011 Census to the DSS Settlement Database to produce Australian Census and Migrants Integrated Dataset, 2011 (cat. no. 3417.0.55.001). ACLD 2011 Census records, some of which had previously linked migrant information, were independently linked to records in the SSRI dataset.

While the greatest possible care was taken in both of these linkage exercises, it is not possible to determine the impact of missed links in any analysis of these records with both migration and social security information. Consequently any analysis based on these records should be performed with caution.


LINKING METHODOLOGY

Data from the 2006 ACLD sample and the 2011 Census were brought together using data linkage techniques. The method involved linking without the use of name or address, as this information was destroyed following completion of Census processing from both 2006 and 2011.

Data linkage is typically undertaken using probabilistic and/or deterministic methods, both of which were used in forming the ACLD:

Deterministic linkage involves assigning record pairs across two datasets that match exactly or closely (within specified tolerance levels) on common variables. This type of linkage is most applicable where the records from different sources consistently report sufficient information and can be an efficient process for conducting linkage.

Probabilistic linkage is a method that determines the likelihood that a pair of records are a match based on how well they agree on a set of variables and then uses statistically valid decision rules to designate which record-pairs are matches, possible matches and non-matches. When calculating the likelihood that a pair of records are a match, the discriminatory power of each variable being used for linking is taken into account. This approach also allows links to be assigned in spite of missing or inconsistent information, provided there is enough agreement on other variables to offset any disagreement.

A number of linkage passes were conducted based on different combinations of variables to ensure each record in the random sample taken from the 2006 Census had the highest possible chance of being linked to a record in the 2011 Census. At the end of the linkage process, 800,759 (82%) of the 979,661 records in the sample records from 2006 were linked to a 2011 Census record.

There were two reasons why some records from the 2006 Census were not linked to a 2011 record:
    • Records belonging to the same individual were present at both time points but these records failed to be linked because they contained missing or inconsistent information.
    • The person had no record in the 2011 Census.

For detailed information on the linking methodology and an assessment of its quality see Australian Census Longitudinal Dataset, Methodology and Quality Assessment (cat. no. 2080.5).

Linkage of migrants related data to the ACLD

The addition of variables relating to migrants, from the Department of Social Services' Settlement Database, to the ACLD is based on the existing linkage in the Australian Census and Migrants Integrated Dataset. For information on the linking methodology of Settlement Database variables see Australian Census and Migrants Integrated Dataset Linking Methodology (cat. no. 3417.0.55.001).

Linkage of social security related data to the ACLD

Variables on the ACLD 2011 Census records and September 2011 SSRI dataset used for linking include:
    • Non-identifying grouped numeric code
    • Age
    • Sex
    • Day of birth
    • Year of birth
    • Country of birth
    • Indigenous status
    • Marital status
    • Meshblock
    • Statistical Areas 1 and 2

A 2006 Census Data Enhancement quality study found that in the absence of name and address, inclusion of a non-identifying grouped numeric code based on name improved the accuracy and efficiency of the linking process while still preserving confidentiality. For further information, see Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (cat. no. 1351.0.55.026).

A non-identifying grouped numeric code is an additional linkage item assigned to each record in the dataset. Each numeric code represents approximately 2000 people and therefore is not unique to an individual and cannot be reversed to identify individuals. The grouped numeric codes were created during the Census processing period, and are only accessible to those ABS staff conducting linkages to the ACLD.

In the linkage of the SSRI information to the ACLD the grouped numeric codes were used alongside personal characteristics such as age, sex and meshblock.

A number of linkage passes were conducted based on different combinations of variables to ensure each ACLD 2011 Census record had the highest possible chance of being accurately linked to a record in the DSS SSRI dataset. This resulted in 214,713 links between 2011 Census records on the ACLD and the September 2011 SSRI extract.

As the overlap of these two datasets is unknown, it is difficult to calculate an exact linkage rate for this exercise. However, similar linkage projects conducted by the ABS using many of the same linkage variables have resulted in linkage rates of around 80-85%. The ACLD-SSRI linkage also used a non-identifying grouped numeric code, with which we expect even higher quality results.

There were three reasons why some ACLD 2011 Census records were not linked to a September 2011 SSRI record:
    • The person did not either receive social security benefits or have social security benefits suspended as of September 2011.
    • The person received social security benefits or had their benefits temporarily suspended as of September 2011, but was not present on the ACLD dataset.
    • Records belonging to the same individual were present in both datasets but these records failed to be linked because they contained missing or inconsistent information.


WEIGHTING, BENCHMARKING AND ESTIMATION

Weighting

Weighting is the process of adjusting a sample to infer results for the relevant population. To do this, a 'weight' is allocated to each sample unit - in this case, person. The weight can be considered an indication of how many people in the relevant population are represented by each person in the sample. Weights were created for linked records in the ACLD to enable longitudinal population estimates to be produced. Cross-sectional population estimates for 2006 and 2011 are available from each Census.

The ACLD began as a random sample of 5% of in scope records from the 2006 Census. As such, each person in the sample should represent about 20 people in the population. Between Censuses, however, the in scope population changes as people die or move overseas. In addition, Census net undercount and data quality can affect the capacity to link equivalent records across waves. The ACLD weights benchmarked the linked records to the population that was resident in Australia at the time of both the 2006 and 2011 Censuses. The weights were based on four components: the design weight, undercoverage adjustment, missed link adjustment and population benchmarking.

The original population benchmark was the 2011 Estimated Resident Population (ERP). The 2011 ERP was chosen over the 2006 ERP as the baseline population as it is more recent. The 2011 ERP was then adjusted so as to exclude people who were not in Australia in 2006 as depicted below.

Diagram describes the longitudinal population overlap between the two Censuses. The 2011 estimate resident population was used as the starting point for estimating deaths, overseas departures, births and arrivals

Weights were benchmarked to the following population groups:
    • state/territory by age (ten year groups) by sex by mobility (interstate arrivals benchmarked separately).
    • Indigenous status by state/territory.

At 12 February 2016 a new weight was applied to the ACLD file to better account for overseas departures and arrivals between 2006 and 2011. Users who have analysed the ACLD prior to 12 February 2016 may notice changes to estimates produced with the revised weight. Estimates of population groups will be different with the total weighted population estimate being 19.5 million compared to 18.6 million on the old weight. Proportions are expected to only show small differences when previous tables are compared.

The weights have a mean value of 24 and range between 17 and 103. Higher weights are associated with people of Aboriginal and Torres Strait Islander origin and people who moved interstate between 2006 and 2011.

While the ACLD weights are believed to be of good quality for use in analysing the longitudinal Australian population and account for missed links between the 2006 and 2011 Censuses, they do not attempt to account for missed links between the ACLD and either the migrant-related data or the social security data.

Estimation

Estimates of population groups are obtained by summing the weights of persons with the characteristic(s) of interest.


SOURCES OF ERROR

All reasonable attempts have been taken to ensure the accuracy of the results of the longitudinal dataset. Nevertheless potential sources of error including sampling, linking and census quality error should be kept in mind when interpreting the results.

Sampling Error

Sampling error occurs because only a small proportion of the total population is used to produce estimates that represent the whole population. Sampling error refers to the fact that for a given sample size, each sample will produce different results, which will usually not be equal to the population value.
There are two common ways of reducing sampling error - increasing sample size and utilising an appropriate selection method (for example, multi-stage sampling would be appropriate for household surveys). Given the large sample size for the ACLD (1 in 20 persons), and simple random selection, sampling error is minimal.

Linking Accuracy

False links can occur during the linkage process as even when a record pair matches on all or most linking fields, it may not actually belong to the same individual. While the methodology is designed to ensure that the vast majority of links are true, some are nevertheless false. The nature of the process used for the ACLD linkage means that while the links obtained are to a high degree of accuracy, some false links may be present within the ACLD dataset. There is an estimated 5–10% false link rate in the ACLD.

For further detail on the accuracy of the linkage, see Australian Census Longitudinal Dataset, Methodology and Quality Assessment (cat. no. 2080.5).

Managing Census Quality

The ABS aims to produce high quality data from the Census. To achieve this, extensive effort is put into Census form design, collection procedures, and processing procedures.

There are four principle sources of error in Census data: respondent error, processing error, partial response and undercount. Quality management of the Census program aims to reduce error as much as possible, and to provide a measure of the remaining error to data users, to allow them to use the data in an informed way.

Respondent error

For most households in Australia, the Census is self-enumerated. This means that householders are required to complete the Census form themselves, rather than having the help of a Census collector. The Census form may be completed by one household member on behalf of others. Error can be introduced if the respondent does not understand the question, or does not know the correct information about other household members. Self-enumeration carries the risk that wrong answers could be given, either intentionally or unintentionally.

Processing Error

Much of the data on the Census form is recorded using automatic processes, such as scanning, Intelligent Character Recognition and other automatic processes. Quality assurance procedures are used during Census processing to ensure processing errors are kept at an acceptable level. Sample checking is undertaken during coding operations, and corrections are made where necessary.

Partial Response

When completing their Census form, some people do not answer all the questions which apply to them. While questions of a sensitive nature are generally excluded from the Census, all topics have a level of non-response. This can be measured and is generally low. In those instances where a householder fails to answer a question, a 'not stated' code is allocated during processing, with the exception of non-response to age, sex, marital status and place of usual residence. These variables are needed for population estimates, so they are imputed using other information on the Census form, as well as information from the previous Census.

Undercount

The goal of the Census is to obtain a complete measure of the number and characteristics of people in Australia on Census Night and their dwellings, but it is inevitable that a small number will be missed and some will be counted more than once. In Australia more people are missed from the Census than are counted more than once, thus the effect when both factors are taken into account is a net undercount.

For more detail see Managing Census Quality.


DATA CONSISTENCY

A small percentage of linked records have inconsistent data, such as a different country of birth at the two time points or an age inconsistency of more than one year (when the expected five year difference is accounted for). Inconsistencies may be due to:
    • false links - the record pair does not belong to the same individual.
    • reporting error - information for the same individual was reported differently in 2006 and 2011.
    • processing error - the value of a data item was inaccurately assigned during processing.

In most analysis, the effect of inconsistent information may only have a small impact. Characteristics from either the 2006 or 2011 data can be used in tables and some exploration of consistency over time will assist in drawing appropriate conclusions.

No data editing was applied to the file beyond that which had already taken place during the relevant Census processing period. A set of consistency flags has been included on the ACLD file so that inconsistent data may be observed, quantified or excluded from calculations. Consistency flags, located in the Longitudinal group of data items, have been created for Census variables that would not be expected to change over time or have unlikely transitions over time. These are as follows:
    • Age
    • Birthplace of Person
    • Birthplace of Male Parent
    • Birthplace of Female Parent
    • Sex
    • Year of Arrival
    • Number of Children Ever Born
    • Registered Marital Status
    • Highest Year of School Completed
    • Level of Highest Non-School Qualification
    • Country of Birth of Spouse or Partner
    • Age of Spouse or Partner
    • Indigenous Status

There are numerous ways to define consistency. The consistency flags have fine level categories to allow users flexibility in using their own definition of consistent or inconsistent. For example, where one Census has 'not stated' for the year of arrival data item, a user can decide whether the record should be considered consistent or not. The same applies to where the response for one Census is 'not applicable'. The labels attached to each category suggesting consistency or inconsistency will assist the user in determining which records are consistent or inconsistent for their needs.

See also Longitudinal Data Items in the Data Items chapter.


INCONSISTENT REPORTING ON THE LINKED ACLD FILE, By selected characteristics

Characteristic
Proportion of linked records with inconsistent data between 2006 and 2011

%
Age (within 1 year)
2.4
Sex
0.1
Birthplace of Person
2.1
Birthplace of Female Parent
4.0
Birthplace of Male Parent
4.4
Year of Arrival
16.5
Indigenous Status (either newly identified or previously identified as Aboriginal and/or Torres Strait Islander)
0.5
Registered Marital Status
0.7
Highest Year of School Completed
6.3
Level of Highest Non-School Qualification
14.9
Country of Birth of Spouse or Partner
2.7
Age of Spouse or Partner
7.9



The ACLD-SSRI file includes additional consistency flags with respect to the SSRI data. Due to small differences in data extraction dates for specific benefits, there are some instances of individuals having multiple records with inconsistent data in the source SSRI file. Where this occurred, one value for the inconsistent data item was selected at random from the SSRI file. These flags identify the records and data item where inconsistency occurred. For example, the AGEDUP field indicates that there were inconsistent age records for an individual SSRI recipient. In general, the effect of this inconsistency is minimal and should only significantly affect analysis results where small populations occur.

Where there are cases of inconsistency between ACLD and SSRI data, there are no consistency flags.

Where demographic information is available from both the ACLD and SSRI data it is recommended to use the Census Demographics as they will align best with most of the information on the dataset, including the weights. However there are a few cases where using the SSRI data may be more appropriate. For example, if looking at Age Pension results but using the Census age a user may observe some records that appear to be on the Age Pension despite not being Age Pension age at the time of the Census. In situations like these, it may be more appropriate to use the SSRI age for individuals where it exists.