DATA INTEGRATION @ THE NMSU
Commonly known as DI or data linkage, the purpose of data integration is to gain more information from the combination of datasets than is available from the datasets separately, without increasing the burden on providers through further survey collections. Linked datasets are particularly appealing because they are often very large, enabling cross tabulations that may not be possible with survey data due to the sample size. Furthermore, where multiple years of data can be linked, cohort analysis can be undertaken to establish common pathways.
The ABS is in a good position to integrate sensitive data from administrative sources because we are governed by the Census and Statistics Act 1905 which prevents the release of information that could be attributed to a specific individual. So, the public can rest assured that their data is in safe hands.
The NMSU is currently working on two data integration projects, both using extracts from the Department of Immigration and Border Protection's (DIBP, formerly the Department of Immigration and Citizenship (DIAC)) Settlement Database (SDB).
Migrants Census Data Enhancement (CDE) Project
The 2011 Migrants Census Data Enhancement (CDE) Project used probabilistic linking to combine the 2011 Census of Population and Housing with the DIBP SDB. The integration of this data enhances the statistical and research value of both datasets by enabling the settlement outcomes of migrants who have arrived in Australia since 1 January 2000 to be analysed in the context of their entry conditions (i.e. their visa type, whether a primary or secondary applicant and onshore/offshore status).
The NMSU has completed the linking of the SDB and Census files. NMSU are now focussing on the output side of the project. The 'Research Paper: Assessing the Quality of Linking Migrant Settlement Records to 2011 Census Data' (cat no. 1351.0.55.043) was released on the ABS website on 19 August 2013 and an electronic publication 'Understanding Migrant Outcomes - Enhancing the Value of Census Data, Australia, 2011' (cat. no. 3417.0) was released on 19 September 2013. It is proposed that a series of associated State level data will be published by the NMSU towards the end of 2013.
The ABS is also linking a 5% sample of the 2011 Census data to the 2006 Census data using probabilistic linking to create a 5% Statistical Longitudinal Census Dataset (SLCD). NMSU will then be able to enhance the SLCD with information from the Migrants CDE Project linked dataset. At this stage we anticipate output from this longitudinal linkage to be available early 2014, however we will keep you updated about our progress in future newsletters.
Migrant Personal Income Tax (PIT) Data Integration (DI) project - Feasibility phase
The Migrant PIT DI project seeks to establish if an extract of the Department of Immigration and Border Protection (DIBP) Settlement Database (SDB) can be integrated with Personal Income Tax (PIT) data from the Australian Taxation Office (ATO). The linking process for the feasibility phase has been completed and analysis is being conducted on the linked file. A research paper is scheduled for release via the ABS website later this year.
The linked dataset may provide insight into the economic outcomes of permanent migrants who arrived in Australia from 1 January 2000. The linked dataset is unique in that it contains many disaggregated income variables not collected elsewhere for these recent migrants, including own unincorporated business income, investment income, and superannuation and annuity income. For more information see the project listing on the Public Register of Data Integration Projects.