Research into a Discrete Calibration Approach for Assisting Analysts Account for Linkage Error
We are undertaking research into a new method, known as discrete calibration, which modifies link weights produced by a probabilistic data linkage method such as Fellegi and Sunter (1969) for analysis purposes. The goal is to enable analysts of linked datasets to better capture linkage uncertainty in their statistical inferences and reduce statistical bias. In the future, analysts of linked data could obtain model inferences by simply carrying out a weighted regression using the modified weights produced by discrete calibration. At this stage this is purely a research project to determine the statistical feasibility of this method and to evaluate it against other methods in the literature.
The discrete calibration approach will not be a replacement for existing data linkage methods - rather, will aid in the analysis or quality assessment of linked data.
The discrete calibration approach obtains suitably modified link weights by calibrating the joint distribution of analytical variables on the linked dataset to their marginal distributions on the individual datasets. By doing so, we can quantify the linkage uncertainty that is inherent in the linkage process, which can be incorporated in analyses and inferences. Our research has evaluated how the performance of discrete calibration relates to initial linkage accuracy, and the improvements that these calibrated weights provide over the raw match scores (obtained from Fellegi-Sunter probabilistic linking) for quality assuring links and for the analysis of linked data. For further details, the paper titled "A Discrete Calibration Approach to Improve Data Linkage", which was presented at the March 2019 meeting of the Methodology Advisory Committee, is available on request.
Future research work will involve:
- generalising the method to partially overlapping datasets;
- if possible, obtaining a mathematical proof that the discrete calibration method leads to a reduction in the bias and variance of model estimates compared to standardised Fellegi-Sunter match scores or un-weighted single best links;
- carrying out a design based simulation to evaluate the approach against other approaches; and
- providing empirical evidence for the size of the impact on estimates from a range of different models applied to test data.
For more information, please contact Daniel Elazar
Methodology@abs.gov.au
The
ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.