Latest release

16. Data Quality

Australian System of Government Finance Statistics: Concepts, Sources and Methods

Reference period

2015

Released

23/12/2015

Next release Unknown

Part A - Introduction

16.1.

This chapter discusses the overall quality of the GFS statistics included in each release of data for a given period.

16.2.

In any statistical undertaking, there is usually some trade-off between accuracy, reliability and timeliness. The trade-off involves balancing users’ requirements for timely statistics against the time and cost (of the ABS and data suppliers) required to collect and compile statistics of a given degree of accuracy and reliability. Generally, any improvement in timeliness comes at the expense of accuracy and reliability.

16.3.

In this discussion, accuracy is defined as the closeness of an estimate to the ‘true’ value. Reliability is defined as the stability of an estimate as measured by the size and frequency of revisions made to the estimate over time. These two attributes should always be considered together, as it is possible to have a statistic that is reliable (because it is revised infrequently) but always inaccurate. In general, timeliness refers to the amount of time between the end of the period to which the statistics refer and the date of first release of the statistics to users.

Part B - The ABS data quality framework

16.4.

The ABS has developed a Data Quality Framework (DQF) which is used in evaluating the quality of ABS statistical collections and products (e.g. survey data, statistical tables), including administrative data. The ABS DQF is used in the collection, processing and dissemination of GFS. The ABS DQF is comprised of seven dimensions of quality, reflecting a broad and inclusive approach to quality definition and assessment. The seven dimensions of quality comprise:

Institutional environment;
Relevance;
Timeliness;
Accuracy;
Coherence;
Interpretability; and
Accessibility.

Institutional environment

16.5.

The first dimension of quality in the ABS DQF is the institutional environment. This dimension refers to the institutional and organisational factors which may have a significant influence on the effectiveness and credibility of the agency producing the statistics. Consideration of the institutional environment associated with a statistical product is important as it enables an assessment of the surrounding context, which may influence the validity, reliability or appropriateness of the product. The dimension of institutional environment can be evaluated by considering six key aspects:

Impartiality and objectivity: whether the production and dissemination of data are undertaken in an objective, professional and transparent manner.
Professional independence: the extent to which the agency producing statistics is independent from other policy, regulatory or administrative departments and bodies, as well as from private sector operators, and potential conflict of interest.
Mandate for data collection: the extent to which administrative organisations, businesses and households, and the public at large may be compelled by law to allow access to, or to provide data to, the agency producing statistics.
Adequacy of resources: the extent to which the resources available to the agency are sufficient to meet its needs in terms of the production or collection of data.
Quality commitment: the extent to which processes, staff and facilities are in place for ensuring the data produced are commensurate with their quality objectives.
Statistical confidentiality: the extent to which the privacy of data providers (households, enterprises, administrations and other respondents), and the confidentiality of the information they provide, are guaranteed (if relevant).

Relevance

16.6.

The second dimension of quality in the ABS DQF is relevance. This dimension refers to how well the statistical product or release meets the needs of users in terms of the concept(s) measured, and the population(s) represented. Consideration of the relevance associated with a statistical product is important as it enables an assessment of whether the product addresses the issues most important to policy makers, researchers and to the broader Australian community. The dimension of relevance can be evaluated by considering the following key aspects:

Scope and coverage: the purpose or aim for collecting the information, including identification of the target population, discussion of whom the data represent, who is excluded and whether there are any impacts or biases caused by exclusion of particular people, areas or groups.
Reference period: this refers to the period for which the data were collected (e.g. the September-December quarter of the 2014-15 financial year), as well as whether there were any exceptions to the collection period (e.g., delays in receipt of data, changes to field collection processes due to natural disasters).
Geographic detail: information about the level of geographical detail available for the data (e.g. postcode area, Statistical Local Area) and the actual geographic regions for which data are available.
Main outputs / data items: whether the data measures the concepts meant to be measured for its intended uses.
Classifications and statistical standards: the extent to which the classifications and standards used reflect the target concepts to be measured or the population of interest.
Type of estimates available: this refers to the nature of the statistics produced, which could be index numbers, trend estimates, seasonally adjusted data, or original unadjusted data.
Other cautions: information about any other relevant issue or caution that should be exercised in the use of the data.

Timeliness

16.7.

Timeliness is the third dimension of quality in the ABS DQF. Timeliness refers to the delay between the reference period (to which the data pertain) and the date at which the data become available; and the delay between the advertised date and the date at which the data become available (i.e. the actual release date). These aspects are important considerations in assessing quality, as lengthy delays between the reference period and data availability, or between advertised and actual release dates, can have implications for the currency or reliability of the data. The dimension of timeliness can be evaluated by considering two key aspects:

Timing: this refers to the time lag between the reference period and when the data actually become available (including the time lag between the advertised date for release and the actual date of release). For example, the reference period may be the 2004-05 financial year, but data may not become available for analysis until the middle of 2006.
Frequency of survey: this refers to whether the survey or data collection was conducted on a oneoff basis, or whether it is expected to be ongoing. If it is expected to be ongoing, frequency also includes information about the proposed frequency of repeated collections and when data will be released for subsequent reference periods.

Accuracy

16.8.

The fourth dimension of quality in the ABS DQF is accuracy. Accuracy refers to the degree to which the data correctly describe the phenomenon they were designed to measure. This is an important component of quality as it relates to how well the data portray reality, which has clear implications for how useful and meaningful the data will be for interpretation or further analysis. In particular, when using administrative data, it is important to remember that statistical outputs for analysis are generally not the primary reason for the collection of the data.

16.9.

Accuracy should be assessed in terms of the major sources of errors that potentially cause inaccuracy. Any factors which could impact on the validity of the information for users should be described in quality statements. The dimension of accuracy can be evaluated by considering a number of key aspects:

Coverage error: this occurs when a unit in the sample is incorrectly excluded or included, or is duplicated in the sample (e.g., a field interviewer omits to interview a set of households or people in a household). Coverage of the statistical measures could be assessed by comparing the population included for the data collection to the target population.
Sample error: where sampling is used, the impact of sample error can be assessed using information about the total sample size and the size of the sample in key output levels (e.g. number of sample units in a particular geographical area), the sampling error of the key measures, and the extent to which there are changes or deficiencies in the sample which could impact on accuracy.
Non-response error: this refers to incomplete information provided by a respondent (e.g., when some data are missing, or the respondent has not answered all questions or provided all required information). Assessment should be based on non-response rates, or percentages of estimates imputed, and any statistical corrections or adjustment made to the estimates to address the bias from missing data.
Response error: this refers to a type of error caused by respondents intentionally or accidentally providing inaccurate responses, or incomplete responses, during the provision of data. This occurs not only in statistical surveys, but also in administrative data collection where forms, or concepts on forms, are not well understood by respondents. Respondent errors are usually gauged by comparison with alternative sources of data and follow-up procedures.
Other sources of errors: Any other serious accuracy problems with the statistics should be considered. These may include errors caused by incorrect processing of data (e.g. erroneous data entry or recognition), alterations made to the data to ensure the confidentiality of the respondents (e.g. by adding "noise" to the data), rounding errors involved during collection, processing or dissemination, and other quality assurance processes.
Revisions to data: the extent to which the data are subject to revision or correction, in light of new information or following rectification of errors in processing or estimation, and the time frame in which revisions are produced.

Coherence

16.10.

The fifth dimension of quality in the ABS DQF is coherence. Coherence refers to the internal consistency of a statistical collection, product or release, as well as its comparability with other sources of information, within a broad analytical framework and over time. The use of standard concepts, classifications and target populations promotes coherence, as does the use of common methodology across surveys. Coherence is an important component of quality as it provides an indication of whether the dataset can be usefully compared with other sources to enable data compilation and comparison. It is important to note that coherence does not necessarily imply full numerical consistency, rather consistency in methods and collection standards. Quality statements of statistical measures must include a discussion of any factors which would affect the comparability of the data over time. The coherence of a statistical collection, product or release can be evaluated by considering a number of key aspects:

Changes to data items: to what extent a long time series of particular data items might be available, or whether significant changes have occurred to the way that data are collected.
Comparison across data items: this refers to the capacity to be able to make meaningful comparisons across multiple data items within the same collection. The ability to make comparisons may be affected if there have been significant changes in collection, processing or estimation methodology which might have occurred across multiple items within a collection.
Comparison with previous releases: the extent to which there have been significant changes in collection, processing or estimation methodology in this release compared with previous releases, or any 'real world' events which have impacted on the data since the previous release.
Comparison with other products available: this refers to whether there are any other data sources with which a particular series has been compared, and whether these two sources tell the same story. This aspect may also include identification of any other key data sources with which the data cannot be compared, and the reasons for this, such as differences in scope or definitions.

Interpretability

16.11.

Interpretability is the sixth dimension of quality in the ABS DQF. Interpretability refers to the availability of information to help provide insight into the data. Information available which could assist interpretation may include the variables used, the availability of metadata, including concepts, classifications, and measures of accuracy. Interpretability is an important component of quality as it enables the information to be understood and utilised appropriately. The interpretability of a statistical collection, product or release can be evaluated by considering two key aspects:

Presentation of the information: the form of presentation and the use of analytical summaries to help draw out the key message of the data
Availability of information regarding the data: the availability of key material to support correct interpretation, such as concepts, sources and methods; manuals and user guides; and measures of accuracy of data.

Accessibility

16.12.

Accessibility is the seventh and final dimension of quality in the ABS DQF. Accessibility refers to the ease of access to data by users, including the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which information can be accessed. The cost of the information may also represent an aspect of accessibility for some users. Accessibility is a key component of quality as it relates directly to the capacity of users to identify the availability of relevant information, and then to access it in a convenient and suitable manner. The accessibility of a statistical collection, product or release can be evaluated by considering two key aspects:

Accessibility to the public: the extent to which the data are publicly available, or the level of access restrictions. Additionally, special data services may include the availability of special or non-standard groupings of data items or outputs, if required.
Data products available: this refers to the specific products available (e.g., publications, spreadsheets), the formats of these products, their cost, and the available data items which they contain.

Part C - Quality of GFS data

16.13.

As stated in Part B above, the ABS Data Quality Framework (DQF) is used to assess the quality of ABS statistical collections and products, including administrative data by evaluating these against the seven dimensions of quality listed in paragraphs 16.4 to 16.12 of this manual. The quality of GFS is influenced by the nature of the source data available during the different phases of the GFS statistical cycle. The factors affecting quality of data at each stage of the statistical cycle are discussed in the following paragraphs.

GFS output products

16.14.

GFS output products are published on a quarterly and annual basis. These publications are freely available on the ABS website (www.abs.gov.au). More detailed or customised data requests, not available from the published data, may also be available from the ABS on request.

16.15.

The publication Government Finance Statistics, Australia (ABS Cat. no. 5512.0) contains detailed explanatory notes and a glossary that provides information on the data sources, terminology, classifications and other technical aspects associated with GFS statistics. Additionally, detailed information on the concepts, sources and methods used in compiling GFS can be found in this manual, the AGFS15.

Accuracy

16.16.

The main influence on the accuracy of GFS data is non–sampling errors. Non–sampling errors arise from inaccuracies in collecting, recording and processing the data. The most significant of these errors are misreporting of data, and processing errors. Every effort is made to minimise error by working closely with data providers, training of processing staff and efficient data processing procedures.

16.17.

Where the economic activity of some units is relatively insignificant, undercoverage can arise. These few units are either omitted or some of their activities are not covered by the collection methodology.

Interstate comparisons

16.18.

The GFS statistics are compiled using standard definitions, classifications and treatment of government financial transactions to facilitate comparisons between levels of government, and between states within a level of government. However, the statistics also reflect real differences between the administrative and accounting arrangements of the various governments and these differences need to be taken into account when making interstate comparisons. For example, only a state level of government exists in the ACT and a number of functions performed by it are undertaken by local government authorities in other jurisdictions.

16.19.

Interstate comparisons of data may also be significantly affected by differences in the mix of operations undertaken by state / territory governments and local governments. For example:

Water and sewerage undertakings maybe operated by state / territory government, local government or a combination of both; or
Government transport undertakings are operated exclusively by state / territory authorities in all states except Queensland where bus transport is operated by the local government sector.

16.20.

Each ABS GFS publication details a DQF statement called a Quality Declaration, detailing an assessment of the GFS output across the seven dimensions of the DQF.

Quarterly data

16.21.

As revisions can be made to quarterly GFS data as a result of new and updated information available from jurisdictions and the use of a degree of sampling in compilation, and because the time frame for quality assurance is shorter, the quarterly estimates are the most timely output of GFS data. However, the accuracy and reliability of these statistics can be affected.

Data sources

16.22.

The quarterly statistics are based on information provided in, or underlying, the published accounting statements and reports of governments and their authorities. For the general government sector for the Commonwealth Government and all state / territory governments, the primary quarterly data sources are public accounts and budget management systems of state / territory treasuries and the Commonwealth Department of Finance. For the public non–financial corporation sector, GFS are collected from a survey of the largest corporations in several jurisdictions where the relevant treasury does not provide that data as part of its accounting reporting. For local government, the main data source is a quarterly GFS survey of local governments from all jurisdictions. There are no local government bodies in the ACT.

Revisions

16.23.

Quarterly GFS data is sourced from Commonwealth and state / territory accounts that are not finalised and which are subject to revision. For this reason summing the four quarters of a financial year will not equal the final annual data.

Final data

16.24.

Final data are the complete audited data for any jurisdiction for any given year. These data generally satisfy the level of detail required. However, some dissections required for national accounting purposes are not normally available in financial statements and audited accounts and these have to be estimated. For example, State-level estimates of Commonwealth Government final consumption expenditure, personal benefit payments and gross fixed capital formation are derived for publication in Australian National Accounts: State Accounts (ABS Cat. no. 5220.0).

Data sources

16.25.

The annual statistics are based on information provided in, or underlying, the published accounting statements and reports of governments and their authorities. For the Commonwealth and state / territory governments the primary data sources are:

Public accounts and budget management systems of state / territory treasuries and the Commonwealth Department of Finance;
Annual reports of departments and authorities;
Budget papers; and
Reports of the Auditors-General.

16.26.

For local government, the main data sources are annual statements of accounts completed by local authorities. There are no local government bodies in the ACT.

Revisions

16.27.

Annual GFS data are revised on an annual basis. For this reason differences can occur between equivalent aggregates published in earlier issues of this publication.

Data collection timetables

16.28.

Timetables for the collection and processing of GFS quarterly and annual data are necessarily very tight because users (who also include the providers of data) require the data as input to their own timeconstrained programs. Quarterly production target dates are set mainly to meet the quarterly national accounts timetable, which requires the supply of quarterly GFS data six weeks after the end of the reference period. These deadlines affect the accuracy and reliability of GFS through their impact on the:

Quality of data supplied by data providers;
Amount of data analysis that can be done;
Quality of data classification;
Checking and editing of input and output data;
Amount of estimation and imputation required;
Number of revisions processed; and
Verification of output.

16.29.

While some of these processes can be carried out concurrently, only a limited amount of time can be allocated in total to all the tasks involved in order to meet fixed deadlines, so trade-offs between accuracy and timeliness have to be made.

16.30.

Timeliness of GFS output differs for the different streams of data. Quarterly estimates are the most timely. The final data are usually released within nine to 12 months of the reference period.

Data coverage

16.31.

Not all in-scope enterprises are individually covered in GFS because the cost of collecting data from small units outweighs gains in accuracy and reliability. The way in which individual units are covered in GFS dictates the level of data estimation, which affects the quality of GFS. Most units are ‘directly’ covered while other units are ‘indirectly’ covered. A directly covered unit is one for which data from the unit’s accounts are included in GFS. An indirectly covered unit is one for which economic flows and stocks are deduced from data recorded by the directly covered units with which the indirectly covered unit undertakes transactions.

16.32.

Indirect coverage of units is employed where the data of individual units are not readily available, are not available in sufficient time or are of insufficient statistical significance to warrant the cost of direct coverage. The most common example of units which are indirectly covered are public hospitals. Most of the data for the public hospitals in each state and territory can be deduced from data in the records of the relevant jurisdiction’s health department.

16.33.

While the detrimental impact of the indirect (partial) coverage of in-scope units on the accuracy and reliability of GFS has not been quantified, the amount of information missed by use of the procedure is considered to be small.

16.34.

A small number of in-scope units are deliberately excluded from coverage because the cost of their inclusion outweighs the marginal increase in the accuracy of GFS. No statistical expansion is made to account for this under-coverage.

Estimation errors

16.35.

The quarterly data are compiled using a mix of full enumeration of larger units and some sampling of smaller units. Stratified random sampling of local government units is used to produce quarterly estimates for the local government sector. As well, some dissections of quarterly data for other levels of government are estimated using previously recorded ratios. Overall, the use of sampling in Australia’s GFS is relatively minor.

16.36.

Estimation errors for individual levels of government arising from the adjustments made for undercoverage built into the quarterly collection cannot be quantified readily. The estimation techniques involve assuming that the relationships between the collected and uncollected data that existed in the last annual benchmark census remain the same in the current quarter. The estimates made represent only a very small proportion of the value recorded for the data items concerned.

Data processing errors

16.37.

The ABS GFS processing system has been designed to incorporate a series of data checks and edits with the purpose of minimising or eliminating data processing errors. However, data processing errors can go undetected either because there is insufficient time to undertake all the checks and edits, or because there is not a check or edit covering a particular error. Such occurrences affect the accuracy and reliability of GFS output. Undetected errors arising from incomplete editing are part of the trade off between accuracy and timeliness. The errors in question are usually small and are usually detected when more complete editing can be undertaken. Errors that are not detected by input editing may be detected in output editing, which is an essential complement to the input editing process.

16.38.

Errors may occur when a data provider either provides an incorrect figure or has to provide an estimate for data that are not readily available from accounting records. Errors can also occur because analysts may misclassify transactions in such a way that the errors are not detected in the editing process.

16.39.

It is impossible to quantify the effect of undetected data processing errors. However, the effect of such errors that go undetected for a time but are eventually detected is reflected in revisions, which are quantifiable.

Consolidation

16.40.

Inaccuracies and imbalances may arise during the process of consolidating data. Inaccuracies can arise because accounting records do not enable identification of intra-sector flows and stocks, or because errors and omissions are made in the allocation of source and destination codes. Such errors will usually give rise to imbalances that will be detected in the consolidation process. Every effort is made to resolve such imbalances that are material. When imbalances cannot be resolved in time for publication, the data are forced into a balance by adopting a convention (e.g. the record of the ‘higher’ level of government prevails) or making a judgement as to which of the two values should be accepted. Forced balancing does not necessarily give the ‘right’ answer. However, because the data to which forced balancing is applied should not be material, errors arising from this source should not be significant.

Data revisions

16.41.

Revisions are amendments made to previously released data. They can occur for a number of reasons. As previously discussed, a major reason for revisions in GFS is the replacement of data over the processing cycle. Revisions are also required because errors are detected in data after their initial release. Conceptual and methodological changes also give rise to revisions.

16.42.

Revisions are made to the quarterly GFS data as required as a result of new and updated information available from jurisdictions. Annual GFS data are revised on an annual basis. For this reason differences can occur between equivalent aggregates published in earlier issues of the publication.

16.43.

Revisions to GFS data are not applied immediately, but are applied at specified times that coincide with the release of publications. This means that, at any point in time, the data may include estimates that will not be updated until revisions are applied. However, restriction of the application of revisions to particular times is preferable to having a data set that is continually subject to change.

16.44.

The times of the application of revisions to GFS data are currently dictated by the revisions policy for the Australian System of National Accounts. The policy allows revisions to be applied in the releases for various quarters as required by National Accounts Branch.

Verification of data

16.45.

Prior to the publication of GFS data, data is sent to each respective Commonwealth, state and territory jurisdiction by the ABS for verification. This process serves as a form of output editing by suppliers of GFS data. The verification process also allows the ABS to assess coherence and identify differences between source data compiled under accounting frameworks compared with GFS data compiled by the ABS.

Quality Assurance

16.46.

The ABS has in place, and continues to maintain, a system of quality assurance to assess and manage risks associated with data quality. This system is designed to ensure GFS data published by the ABS is fit for purpose. For more information on how data quality is defined or applied to the GFS data outputs see the ABS website: www.abs.gov.au or the Data Quality Declaration in each of the ABS GFS publications.