Page tools: Print Page Print All | ||
|
AN EMPIRICAL BAYESIAN APPROACH TO ENTITY-BASED DATA LINKAGE With these areas for improvement in mind, the ABS has been investigating alternative, state-of-the-art approaches to entity-based linkage. One method which is well suited to a number of important ABS requirements is the empirical Bayesian approach described in (Steorts 2015), known as ebLink. Unlike many alternative methods, ebLink directly models the entities in the domain (e.g. Australian residents) and the links from records to entities. This makes it a good fit for the population spine, since it can provide a statistical measure of the association between spine entities and dataset records. It also includes the usual benefits of a Bayesian framework, namely: accounting of uncertainty through the posterior distribution, the ability to incorporate prior information, and the facilitation of complex hierarchical models. In order to assess the feasibility of ebLink, the ABS is collaborating with the University of Melbourne through the APR.Intern programme. The poor scalability of ebLink was quickly identified as an obstacle, but has been somewhat mitigated by a re-parametrisation of the model that incorporates blocking ideas, and enables the inference to be distributed across a compute cluster. As part of the collaboration, a prototype is being implemented in Apache Spark (a distributed computing framework). Early experiments indicate that ebLink slightly outperforms the ABS’s established methods in terms of linkage accuracy, while also providing a full posterior distribution over the linkage structure. However, computational efficiency/scalability remains a challenge for future work. References Steorts, Rebecca C. (2015). “Entity Resolution with Empirically-Motivated Priors”. Bayesian Analysis. 10 (4): pp. 849-875. ABS (2016). “Personal Income Tax and Migrants Integrated Dataset (PITMID) 2011-12 Quality Assessment”. ABS Research Paper. cat. no. 1351.0.055.060 Further information The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.
Document Selection These documents will be presented in a new window.
|