2940.0 - Census of Population and Housing: Details of Overcount and Undercount, Australia, 2016 Quality Declaration
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 27/06/2017
Page tools: Print Page Print All | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
LINKING AND MATCHING
STANDARDISATION 3 In preparation for ADL, PES data were repaired and standardised through a four-stage process to convert them into a format that could be directly compared with similarly standardised Census data:
ADDRESS CODING 4 Address information is essential for matching PES respondents to their Census form. PES addresses were divided into two categories:
5 The AddressCoder@ABS is a web-based application used to geocode each type of address. From this, geographic information was assigned, such as a Census Field Area (CFA), a Mesh Block (MB), or a Statistical Area 1 (SA1) boundary, which were all used during automated data linking of persons. 6 Geocoding via the address coder was relatively resilient to errors in the address text (e.g. character substitution or form-scanning errors) as it needed only to identify the locality and not a specific address or dwelling. 7 Addresses that were unable to be coded automatically via the AddressCoder@ABS application were sent to a processing team for manual coding. This manual process utilised various methods, including mapping software, to thoroughly scrutinise addresses and achieve the most accurate geographic coding possible. ADDRESS TEXT MATCHING 8 Address text matching was introduced in the 2016 PES and provided an opportunity to identify potential dwelling links based on exact address information. It was used to match a PES address to a specific entry on the ABS Address Register. 9 This exercise was particularly useful for dwelling types that were in scope for linking (e.g. unoccupied dwellings) but unable to be linked via automated data linking, which is person-based. The proposed dwelling link was then fed through to the clerical matching process for confirmation. It should be noted, however, that address text matching was susceptible to errors or missing entries in the address register. AUTOMATED DATA LINKING (ADL) Linking 10 ADL refers to the use of probabilistic linking methods to determine possible links between Census and PES data in an automated fashion, before any clerical matching process begins. This was introduced as the primary linking method in the 2011 PES, which used the Freely Extensible Biomedical Record Linking (FEBRL) software, and was used again in 2016. 11 The automated linking process used a range of personal and address characteristics to evaluate the chance that a PES record and a Census record were for the same person. The method generated large numbers of candidate links and then used a process of elimination to filter down to genuine matches. 12 Seven different linking runs were used in ADL to compare PES and Census records, each focused on a slightly different combination of name, addresses and demographic variables. At the beginning of each run, a list of PES and Census records was obtained by selecting a subset of the PES and Census datasets which agreed on a small number of variables (e.g. the same SA1, date of birth, and surname). This process, called 'blocking', reduced the number of Census and PES records to compare within a run, in order to increase the likelihood of proposing good quality links. 13 The 2016 PES used a set of blocking variables that were comparable to those used in 2011, allowing for updated geographical classifications. The seven linking runs used various combinations of the following:
14 Potential links were generated by assigning weights to reflect the level of agreement for combinations of linking variables within each block. Large positive weights indicated probable matches, while large negative weights indicated probable non-matches. Consolidation of ADL links 15 A series of processes was undertaken following the ADL runs to clean and consolidate the proposed links. 16 The Collect, Analyse, Reduce, De-duplicate and Systematise (CARDS) process identified and rated the most plausible links from each ADL run for all PES respondents. The process then combined the links from all ADL runs and removed any duplicates, with links from earlier runs taking precedence. 17 The final step of the automated linking process was to group person links together into dwelling units when they were co-located in the same PES-Census dwelling pair, through a process called Dwelling Link Rating (DLR). This had several benefits including:
18 The proposed dwelling links were then categorised into the following:
19 All PES dwellings with either Silver or Tin links were sent for clerical review. A small percentage of Platinum links were also clerically reviewed for quality assurance purposes. MATCH AND SEARCH SYSTEM (MSS) Processing in the MSS 20 While ADL is a critical component of PES linking and matching, it cannot entirely replace the traditional clerical decision-making process. Clerical judgment will always be required to resolve the more complex or ambiguous cases and provides a means of quality assuring the automated processes. The MSS is used for this purpose. 21 The MSS allows processing staff to manually search, view, and compare PES and Census data. There are two phases of processing in the MSS:
22 To evaluate ADL links, the processor first needed to confirm whether the ADL-proposed dwelling link was correct. Once the dwelling link was confirmed, the Census person records for that dwelling were compared with the PES person records, using information such as Name, Sex, Date of birth, Age, Registered marital status, Indigenous status and Country of birth. The extent to which each of these variables was the same, in both the PES and the Census, determined a match or a non-match status for the pair. 23 Where the ADL-proposed dwelling link was rejected, or if no dwelling link was proposed by ADL, processing staff undertook an intensive search. This search focused on the nominated (and surrounding) CFAs for all search addresses provided by respondents during the PES interview, in order to locate possible Census forms where that person was included. If a dwelling match was found, they proceeded to rate the candidate person matches within that dwelling as per the above. 24 Some redevelopment of the MSS was necessary in 2016 to ensure the system aligned with the changes made to the 2016 Census enumeration model. During this redevelopment work, some system enhancements were made to further strengthen and streamline the clerical matching processes and outcomes; however, the system is considered to be largely comparable with the 2011 version. MSS Quality Assurance and Adjudication Processes 25 Quality assurance (QA) procedures were used to ensure the accuracy of MSS outcomes. For example, all records sent to the MSS were processed twice. The QA workloads were processed by a different processor, and there were no identifiers to mark it as an original or QA workload. 26 Where the original and the QA match status corresponded, the original match status was accepted. Where there was a discrepancy between the original match status and the QA match status (at either the dwelling or person level), the records were flagged for adjudication by a senior processor (adjudicator) who reviewed all information and determined which match status was correct. Where both the original and QA records were deemed to be incorrect, the adjudicator reprocessed the record. 27 The adjudication process was also useful in identifying potential issues or areas where processing staff were having difficulty. This allowed ongoing feedback to be provided to the MSS staff and contributed to the overall quality of PES processing. 28 A 5% sub-sample of Platinum ADL linked dwellings and persons was also processed fully through the MSS as a quality measure of the ADL. Reprocessing these records confirmed the robustness and high quality of the ADL links. Interrogation of any high quality ADL links that were rejected by the processing staff was also undertaken as a further quality assurance measure. DISCRETE COMMUNITY PROCESSING 29 As per 2011, ADL was not utilised for processing the Discrete Communities sample in 2016. The low quality geocoded data for these areas, as well as the ability for PES respondents to provide alternative names, would have complicated the ADL process. Instead, the sample underwent full clerical searching and matching in the MSS. System enhancements made to the MSS enabled more thorough and efficient processing to be completed. 30 The process involved searching the entire community for a person match, rather than just searching within a single dwelling. Person matching in Discrete Communities used the same rules for determining a match as in the General Population, but allowed for the use of up to two alternative names for each person when matching on name. CONFIDENCE OF MATCH DECISIONS 31 Outcomes from linking and matching processes underwent a high level of scrutiny and quality assurance in 2016, to ensure the PES did not miss links for PES persons who were actually counted in the Census, and did not link a PES person to a Census record in error. 32 Final match rates for the General Population for persons with at least one link to 2016 Census were lower than the 2011 equivalents (91.7% and 92.7%, respectively). This change was driven by a reduction in Census response rates. However, more high quality links were found by ADL in 2016 that did not require clerical review, compared with 2011 (65.1% and 59.8%, respectively). This is likely to be the result of improved capture of text fields, such as names, from increased online Census uptake. Matching Outcomes, 2011- 2016
(b) Matches for the Discrete Community sample were made via the MSS only. Document Selection These documents will be presented in a new window.
|