1308.7 - Inform NT, Jun 2009  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 30/06/2009   
   Page tools: Print Print Page Print all pages in this productPrint All

Picture of outback road FEATURE ARTICLE


DIAGNOSING DATA QUALITY: HOW HEALTHY ARE YOUR DATASETS?

Between 'Closing the Gap', the Northern Territory Emergency Response (NTER), and the Council of Australian Governments' (COAG) new financial framework, the Northern Territory has never been so reliant on accurate and meaningful statistics. However, data is a meaningless stream of numbers without information about who, when, where, what and how it was collected; otherwise known as 'metadata'.

There have been several means implemented to assist the NT to manage and source metadata on existing collections; not least of which is the Data Sources Inventory. The NT's Data Sources Inventory is an initiative of NT Treasury that aims to create a metadata repository for NT datasets.

To help users and producers of statistics assess the metadata of their own and others' datasets, the ABS recently published a Data Quality Framework (DQF) (cat. no.1520.0) based on the Statistics Canada Quality Assurance Framework (2002) and the European Statistics Code of Practice (2005). The framework is designed to assist all users and producers of data to evaluate the quality of statistical collections and products through seven dimensions of quality, which are:

Institutional Environment;
Accuracy;
Relevance;
Timeliness;
Interpretability;
Coherence;
and Accessibility.

These dimensions provide the standard framework for assessing and reporting on the quality of statistical information. It is a tool that improves a user's ability to:

  • decide whether a dataset or statistical product is fit for purpose (which in turn helps to identify data gaps);
  • assess the data quality of seemingly similar collections; and
  • interpret data.

It can also assist those developing statistical collections to produce high quality outputs.Following is a summary of each dimension divided into tips for both users and producers of statistics.

Dimension 1: Institutional Environment

Institutional Environment refers to the institutional and organisational factors that may influence the effectiveness and credibility of the agency producing the statistics.

Producers of statistics
To manage Institutional Environment, an organisation that collects data should build a culture that focuses on quality, and an emphasis on objectivity and professionalism. Adequate resources and skills should be made available for the purpose intended. Cooperation of respondents can be encouraged by providing appropriate legal mandate and guarantees.

Users of statistics
To assess the Institutional Environment dimension of published data, a user of statistics should consider: impartiality and objectivity; professional independence; mandate for data collection; adequacy of resources; quality commitment; and statistical confidentiality.

Dimension 2: Accuracy

Accuracy refers to the degree to which the data correctly describe the phenomenon they were designed to measure. Accuracy should be assessed in terms of the major sources of errors that potentially cause inaccuracy.

Producers of statistics
To manage Accuracy, an organisation that collects data should give explicit consideration to the trade-offs between accuracy, cost and timeliness during the design stage. The coverage of the target population that can be achieved by the data collection strategy should be assessed. Proper testing of the instruments for data collection will ensure the reduction of response errors. Adequate measures have to be in place for encouraging response, following up non-response, and dealing with missing data (e.g., through imputation or adjustment made to the estimates). All stages of collection and processing should be subject to proper consideration of the need for quality assurance processes, including appropriate internal and external consistency checking of data with corresponding correction strategies.

Users of statistics
To assess the Accuracy dimension of published data, a user of statistics should consider: coverage error; sample error; non-response error; response error; revisions to data; and other sources of errors including those caused by incorrect processing of data, alterations made to the data to ensure the confidentiality of the respondents, rounding errors, and other quality assurance processes.Dimension 3: Relevance

Relevance refers to how well the statistical product or release meets the users' needs.

Producers of statistics
To manage Relevance, an organisation that collects data should stay abreast of the information needs of its users. Mechanisms for doing this include various consultative and intelligence-gathering processes, and regular stakeholder reviews.

Users of statistics
To assess the Relevance dimension of published data, a user of statistics should consider: scope and coverage; reference period; geographic detail; main outputs/ data items and whether the data measures what it was intended to measure; classifications and statistical standards; type of estimates available; and other cautions such as information about any other relevant issue or caution that should be exercised in the use of the data.

Dimension 4: Timeliness

Timeliness refers to the delay between the reference period (to which the data pertain) and the date at which the data become available; and the delay between the advertised date and the date at which the data become available (i.e., the actual release date). These aspects are important considerations in assessing quality, as lengthy delays between the reference period and data availability, or between advertised and actual release dates, can have implications for the currency or reliability of the data.

Producers of statistics
To manage Timeliness, an organisation that collects data should give consideration to the capability of the organisation to produce the statistics within the given time frame. This capability includes staffing resources, system requirements, and the level of accuracy required of the data. The release of preliminary data followed by revised and final figures is often used a strategy for allowing less accurate data to be available sooner for decision making, with the subsequent release of more complete data occurring at a later stage.

Users of statistics
To assess the Timeliness dimension of published data, a user of statistics should consider; timing, such as time lags between the reference period and when the data actually become available; and the frequency of the survey or other collection.Dimension 5: Interpretability

Interpretability refers to the availability of information to help provide insight into the data.

Producers of statistics
To manage Interpretability, an organisation that collects data should give consideration to the provision of sufficient information about the statistical measures and processes of data collection. Users need to know what has been measured, how it was measured and how well it was measured. The description of the methodology allows the user to assess whether the methods used were scientific or objective, and the degree of confidence they could have in the results.

Users of statistics
To assess the Interpretability dimension of published data, a user of statistics should consider: presentation of the information; and the availability of information regarding the data, such as concepts, sources and methods.

Dimension 6: Coherence

Coherence refers to the internal consistency of a statistical collection, product or release, as well as its comparability with other sources of information, within a broad analytical framework and over time. The use of standard concepts, classifications and target populations promotes coherence, as does the use of common methodology across surveys.

Producers of statistics
To manage Coherence, an organisation that collects data should give consideration to using standard frameworks, concepts, variables and classifications, to ensure the target of measurement is consistent over time and across different collections. As well, the use of common methodology's and systems for data collection and processing will contribute to coherence. Where data are available from different sources, consideration should be given to their confrontation and possible integration.

Users of statistics
To assess the Coherence dimension of published data, a user of statistics should consider: changes to data items; the capacity to make meaningful comparisons across data items; comparison with previous releases; and comparison with other products available.Dimension 7: Accessibility

Accessibility refers to the ease of access to data by users, including the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which information can be accessed. The cost of the information may also represent an aspect of accessibility for some users.

Producers of statistics
To manage Accessibility an organisation that collects data should give consideration to helping users know about the existence of the data or statistical product, locate it, and import it into their own working environment. Output catalogues, delivery systems, distribution channels and media, and strategies for engagement with users are all important considerations relating to this quality dimension.

Users of statistics
To assess the Accessibility dimension of published data, a user of statistics should consider: the extent to which the data are publicly available; and the specific products available, the formats of these products, their cost, and the available data items which they contain.

A more detailed discussion of the ABS Data Quality Framework, May 2009 (cat. no. 1520.0) can be found on the ABS website.