1504.0 - Methodological News, Dec 2017  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 14/12/2017   
   Page tools: Print Print Page Print all pages in this productPrint All

GRAPH-BASED FRAMEWORK FOR DATA LINKING

The ABS has developed a prototype graph-based approach for linking heterogeneous multisource data in statistical production. This work is motivated by the need for a more agile analytical practice to support evidence-based decision making and to utilise diverse new sources of human and machine-generated data associated with digital presence, connectivity and interaction in the global network. Such sources include commercial transactions, remote imagery, sensor measurements, geospatial positioning, web content, and online user activity. They are often collectively referred to under the rubric of big data.

Graph-based data linking involves a paradigm shift in the conceptualisation of complex data from tables, records and fields to graphs (network structures) of entities and relationships. Entity-relationship graphs are similar to the sociograms used in social network analysis to visually depict interpersonal relations. However, entity-relationship graphs can depict any entity or relationship type of analytical utility, and the entities can be multiply connected by different relationships. For this reason, entity-relationship graphs are intrinsically dynamic as their topological properties change over time in response to evolving patterns of interaction in complex systems. The ABS prototype approach makes use of the Semantic Web (Web 3.0) model defined in a set of published standards by the World Wide Web Consortium (W3C) as the operational framework for graph-based data linking.

A feature of the graph-based data linking approach developed by the ABS is the introduction of temporal segments demarcated by events that change the measurable state of individuals and groups in systems of statistical interest. Data entities that correspond to observations of these individuals and groups over time are then explicitly resolved to canonical (or base) entities, for which the temporal segments and events form distinct trajectories through a series of life states. Similar data entities are associated by a set of logically precise equivalence relations that extend the implicit notion of identity in traditional data linking approaches.

The ABS graph-based approach also embeds strong computable semantics in the description of entity and relationship types. This creates a substrate for the execution of new reasoning methods that utilise the inherent connectedness of the entity-relationship graph to make collective linking decisions. These either draw on the logical properties of entity and relationship types through a process of deductive reasoning based on First-Order Logic (FOL), or they generalise the patterns of association for existing entities through an inductive method such as statistical learning on the graph structure. Deduction and induction are synergistic, and may be interleaved in an iterative process. The ABS is currently evaluating automated FOL reasoning based on description logics and horn-clause logic, as well as a range of graph-based inductive reasoning schemes involving relational machine learning and kernel functions.

Further Information

For more information, please contact Ric Clarke Ric.Clarke@abs.gov.au

The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.