2080.5 - Information Paper: Australian Census Longitudinal Dataset, Methodology and Quality Assessment, 2011-2016 Quality Declaration 
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 27/02/2018   
   Page tools: Print Print Page Print all pages in this productPrint All

1. INTRODUCTION

The Australian Census Longitudinal Dataset (ACLD) uses data from the Census of Population and Housing to build a rich longitudinal picture of Australian society. The ACLD can uncover new insights into the dynamics and transitions that drive social and economic change over time, and how these vary for diverse population groups and geographies. Three waves of data have contributed to the ACLD so far, from the 2006, 2011 and 2016 Censuses.

In this first release of the 2011-2016 ACLD, a representative sample of over 1.2 million records from the 2011 Census (Wave 2) was linked to corresponding records from the 2016 Census (Wave 3) to form the 2011 Panel of the ACLD. The 2011 Panel includes new births and migrants since the 2006 Census, and is a rich resource for exploring how Australian society has changed between the 2011 and 2016 Censuses.

A second release of the 2016 ACLD in mid-2018 will include additional variables on the 2011 Panel, as well as an updated 2006 Panel consisting of a linked sample between the 2006, 2011 and 2016 Censuses. The 2006 Panel was first released in December 2013 (as the Australian Census Longitudinal Dataset, 2006-2011 (cat. no. 2080.0)), bringing together a sample of almost one million records from the 2006 Census (Wave 1) with corresponding records from the 2011 Census (Wave 2). The addition of corresponding records from the 2016 Census (Wave 3) will expand our understanding of the dynamics and transitions that have been driving change in Australia since the 2006 Census.

As information from subsequent Censuses are added to the ACLD, its value as a resource for longitudinal studies of the Australian population will continue increasing.

This paper describes the background and rationale for the ACLD, the data linkage methodology used for producing the 2011 ACLD Panel and an assessment of its quality.


1.1 OVERVIEW

Development

In 2005, the ABS embarked on a project to enhance the value of Census data by bringing it together with other datasets, both ABS and non-ABS, to leverage more information from the combination of datasets than would be available from the individual datasets separately. The ACLD was proposed as an enduring longitudinal dataset constructed through the linking of records from successive Censuses.

As part of the development phase, a quality study was undertaken in which data from the 2005 Census Dress rehearsal were linked to data from the 2006 Census. This quality study concluded that the linkage methodology was feasible and that the expected quality of the linked data file would be sufficient for longitudinal analysis. For more information see, Assessing the Likely Quality of the Statistical Longitudinal Census Dataset (cat. no. 1351.0.55.026).

2006 Panel

As a result of the positive assessment from this quality study, a 5% random sample (979,661 records) was selected from the 2006 Census to comprise the 2006 Panel of the ACLD. This sample was then brought together with data from the 2011 Census using data linkage techniques, resulting in a linked data file consisting of 800,758 records, released as the Australian Census Longitudinal Dataset, 2006-2011 (cat. no. 2080.0) in December 2013.

Since the first release, the 2006 Panel of the ACLD has been enhanced with administrative data, including information relating to migrants from the Department of Social Services’ (DSS) Settlement Database, drawing from the data linkages in the Microdata: Australian Census and Migrants Integrated Dataset, 2011 (cat. no. 3417.0.55.001). In August 2017, data from the Social Security and Related Information (SSRI) dataset was added to the ACLD to create Microdata: Australian Census Longitudinal Dataset with Social Security and Related Information, experimental statistics, 2006-2011 (cat. no. 2085.0). The linkage of the ACLD with the SSRI data brings together information about the characteristics and circumstances of people who have interacted with the social security system, and has the potential to increase knowledge about a wide range of socio-economic issues facing Australians and their families.

An updated 2006 Panel consisting of a linked sample between the 2006, 2011 and 2016 Censuses will be released in mid-2018.

2011 Panel

In preparation for adding 2016 Census data to the ACLD, a new panel of 2011 Census records was selected as a representative sample of the 2011 population. The 2011 Panel was designed to include:

  • most of the 2011 Census records that were linked in the 2006 Panel;
  • new records to account for missed links in the 2006 Panel; and
  • new records to represent new births and migrants since the 2006 Census.

For further information on the multi-panel sample design refer to Section 1.2.

The 2011 Panel size was increased slightly to 5.7%, to achieve a linked sample size of no greater than 5% of the population after allowing for missed links and people no longer being in scope due to death or overseas migration (note that the linked sample size for the 2006 Panel linked to the 2011 Census was only 4.2%.). The 2011 panel sample of over one million records (1,221,057) from the 2011 Census was linked to the 2016 Census, resulting in a linked sample size of 927,520 records (4.3%).

Linking the ACLD

Data from the 2011 ACLD Panel sample and the 2016 Census were brought together using data linkage techniques.

Data linkage is typically undertaken using a combination of deterministic and probabilistic methods:
  • Deterministic linkage involves assigning record pairs across two datasets that match exactly or closely on common variables. This type of linkage is most applicable where the records from different sources consistently report sufficient information and can be an efficient process for conducting linkage.
  • Probabilistic linkage is based on the level of overall agreement on a set of variables common to the two datasets. This approach allows links to be assigned in spite of missing or inconsistent information, providing there is enough agreement on other variables.

The 2011-2016 ACLD builds on the success of the ABS' data integration program from the past decade, benefitting from advances in linking methodology, technology and data availability to deliver a high quality integrated statistical resource.

To protect the privacy of Census respondents, we used an ABS encoded Census name for linking 2011 and 2016 Census records in the ACLD. Encoding was undertaken in 2011 for the purpose of protecting privacy by anonymising name and improving the future quality and efficiency of the linking process.

The codes are created by grouping people with a combination of letters from their first and last names using a secure one-way process, meaning that a code cannot be reversed to deduce the original name information. Each code represents approximately 2,000 people drawn from many different letter combinations, and therefore is not unique to an individual. Actual name information from the 2016 Census was not used to link to 2011 Census records.

The codes are only accessible to those ABS officers creating the linked dataset, and will never be released outside the ABS.

For many individuals the linkage process will have accurately matched their corresponding records between Censuses. In some cases, the link will represent different people who share a number of characteristics in common. Some inaccuracy in the linkage will not generally affect statistical conclusions drawn from the linked data, although care should be taken in the interpretation of results. For more information see Section 2 - Data Linking Methodology.


1.2 MULTI-PANEL SAMPLE DESIGN

Without sample maintenance, the ACLD would decline in its ability to accurately reflect the Australian population over time due to:
  • people newly in scope of the ACLD (i.e. children born and immigrants arrived in Australia since the previous Census) not being represented in the sample;
  • people selected in the ACLD sample no longer being in scope due to death or overseas migration; and
  • missing and/or incorrect links (linkage bias).

Linkage bias in longitudinal datasets is unique to those created via data integration, as traditional longitudinal studies employ strategies to ensure they collect information about the same individual over time. In a linked longitudinal dataset, data integration is necessary due to a lack of a common identifier to identify a person's responses over time. Linkage bias occurs where certain populations are more difficult to link than others (e.g. Aboriginal and Torres Strait Islander people, young males), so links are more likely to not be identified for members of these groups and, if they are found, have a higher chance of being inaccurate. If left untreated, the representation of population groups suffering from linkage bias would worsen as each new Census is linked to the ACLD.

The ACLD sample is maintained through application of the Multi-Panel framework, developed by Chipperfield, Brown & Watson (2017). This framework provides an approach for selecting records in the ACLD to create panels which maintain the longitudinal and cross-sectional representativeness of the dataset over time, while minimising the impact of accumulated linkage bias on longitudinal analysis.

The Multi-Panel approach designs multiple overlapping panels, with each panel representing a single Census population (2006, 2011, 2016, etc.), which is then linked to subsequent Censuses. The sample selection strategy for each panel is designed to maintain a linked sample size of 5%, maximise sample overlap between the panels, and introduce new records to the dataset in each panel to account for new births, migrants and missed links in previous panels. This allows flexibility for users, who can draw on the most appropriate panel for their research question.

The sample overlap between the 2006 and 2011 ACLD Panels is illustrated below:



FIGURE 1 - SAMPLE OVERLAP BETWEEN THE 2006 AND 2011 ACLD PANELS

Image describes the overlap between the 2006 and 2011 Census Panels, data in text below.



Of the 800,758 2011 Census records linked in the original 2006 Panel, 730,756 (91%) are in the 2011 Panel sample. The exclusion of 70,002 2006 Panel records from the 2011 Panel is the result of the sample design used to ensure the representativeness of the 2011 Panel sample.


1.3 CENSUS DATA QUALITY

In June 2017, a Report on the Quality of 2016 Census Data was released by the Census Independent Assurance Panel reviewing the quality of the 2016 Census data. The Panel determined that the 2016 Census data is of a comparable quality to previous Censuses, is useful and useable, and will support the same variety of uses of Census data as was the case for previous Censuses.

The Report included a broad assessment of the key ACLD linking variables, name and date of birth. Although the quality of these variables was still high for the 2016 Census, there was a decrease in the quality of this information relative to the 2011 Census. The Report noted the following:
  • a substantial increase in the non-response rate for date of birth, increasing from 10% in 2011 to 19% in 2016;
  • an increase in non-response for first name, from 49,000 persons in 2011 to 209,000 persons in 2016; and
  • an increase in non-response for surname, from 127,000 persons in 2011 to 274,000 persons in 2016.

For further information on the quality of particular Census variables, please see Understanding the Census and Census Data, Australia, 2016 (cat. no. 2900.0).


1.4 BENEFITS OF THE ACLD

Each five-yearly Census provides a rich set of information about Australian people and households at a point in time. The Census provides information on characteristics such as age, sex, Indigenous status, country of birth and year of arrival, together with topics such as family structure, education and qualifications, work, income and housing, and presence of a severe or profound disability. It is able to provide a rich picture of social and economic conditions at a particular point in time, and how these conditions are changing over time and across population groups.

The ACLD adds the ability to study transitions in social and economic conditions at the individual level, giving insight into the pathways that tend to lead to particular outcomes, and how these pathways vary for different population groups. It also enables the study of likely consequences of certain socio-economic circumstances for different population groups, as evidenced by the patterns in the longitudinal data. The ACLD aims to help in the development of evidence based strategies to promote positive pathways and avoid negative ones, and assist policy makers in assessing both the social and financial benefits of related intervention strategies.

Since the first release of the ACLD, policy makers and researchers have used evidence from the ACLD to:
  • better understand the factors associated with people changing their self-identification as Aboriginal or Torres Strait Islander;
  • investigate employment outcomes for workers leaving the motor vehicle industry; and
  • investigate changes in family relationships and fertility.