3302.0.55.004 - Information Paper: Death registrations to Census linkage project - Methodology and Quality Assessment, 2011-2012  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 18/09/2013  First Issue
   Page tools: Print Print Page Print all pages in this productPrint All  

BLOCKING


Once data files have been standardised, record pairs (consisting of one record from each file) can be compared to see whether they are likely to be a match, i.e. belong to the same person. However, if the files are even moderately large, comparing every record on File A with every record on File B is computationally infeasible. Blocking reduces the number of comparisons by only comparing record pairs where matches are likely to be found – namely, records which agree on a set of blocking variables. Blocking variables are selected based on their reliability and discriminatory power. For instance, sex is partially useful as it is typically well reported, however it is minimally informative as it only divides datasets into two blocks, and is thus used in conjunction with other variables.

The process of blocking reduces the computational intensity of data linking. However, comparing only records that agree on a particular set of blocking variables means a record will not be compared with its match if it contains missing, invalid or legitimately different information on a blocking variable. To mitigate this, the linking process is repeated a number of times, using a range of different blocking strategies. For example, on the first pass, a block by a low level of geography (Mesh Block) was used to capture the majority of Death registrations that had matching addresses with their corresponding Census records. This means, however, that those Death registrations that had moved since being enumerated in the Census were not compared. Records which failed to link in the first pass proceeded to the next pass, in which a different set of blocking variables was used. For the second pass, by blocking on date of birth rather than geography, the Death registrations of people who had moved or who had missing or invalid address information were able to be compared.

Table 2.3 presents the blocking variables used for each pass. The strategy employed was similar to the approach used in the 2006 cycle, with some minor adjustments being made to the first four passes of the linkage run. Refer to Linking Census Records to Death Registrations (Cat. No. 1351.0.55.030) for the 2006 blocking and linking strategy.




Table 2.3 - BLOCKING VARIABLES, By pass number

Blocking variable

Pass 1 Mesh Block
Pass 2Sex, Initial 4
Pass 3Day, month and year of birth
Pass 4Sex, Postcode
Pass 5Indigenous status



A more significant change to the 2011 blocking and linking strategy was the inclusion of a fifth pass, which involved linking any remaining unlinked Aboriginal and Torres Strait Islander deaths to the Census. In this pass, a modified Indigenous status variable was used for blocking, which enabled Aboriginal and Torres Strait Islander people on both datasets to be compared. This run was computationally feasible as it excluded all non-Indigenous Census records.



Previous PageNext Page