SIGNIFICANCE EDITING IN THE ABS
We have established a 'centre of excellence' for significance editing within the methodological program. A major goal of the centre is to develop and foster a more unified approach to editing within the ABS. The centre is headed by Keith Farwell in our Tasmanian office.
Recently, a paper (available on request) was presented to the Economic Statistics Strategy Co-ordination Committee which outlined the directions we would like to see for methodological research as well as practical application. Two of the aims of the paper were to present both a framework from which significance editing practices and procedures could be applied and our views on the future directions that significance editing should take within the ABS.
One of the main features of the framework proposed is the use of the terms "input significance editing" and "output statistical editing".
What is significance editing?
The term "significance editing" is used to describe a general editing approach which incorporates survey weights and estimation methodology into edits and maintains a link between individual responses and output estimates. However, it is often necessary to distinguish between whether significance editing is being performed at the input stage or at the output stage of the collection cycle. When the distinction is needed, we use 'input significance editing' to describe significance editing applied at the input stage and 'output statistical editing' when it is applied at the output stage. Although each requires specific measures, both fit within the general framework. For input significance editing, a score is produced for each response which links editing effort to the likely impact it will have on estimates. For output statistical editing, a score links units and their weighted contributions to specific estimates or 'output cells'. In either case, responses can be ranked in order of score size to produce a prioritised list of units which will direct resources to those areas where editing effort is expected to have the greatest impact.
Scores can be calculated at the item level and the provider level. For example, a provider can have several item scores and one provider score. The provider score is a summary score based on the provider's item scores. It is expected that both kinds of scores will be useful.
Input Significance Editing
A basic standardised input significance score (which can be thought of as measure of editing benefit) for an estimate of level is made by calculating the absolute difference between the reported value and an imputed value for that unit and multiplying this difference by the unit's weight.
The input significance score can be calculated independently of the response rate thus allowing editing action to begin as soon as responses are received. A unit's previous return is often used as the imputed value, survey design weights are usually used as approximations for estimation weights, and the previous estimates are usually used to approximate the current expected estimates. Units with a score higher than a specified cut-off value can be selected and placed in a prioritised list for editing attention. Even if a cut-off is not used, units can still be ranked and prioritised for attention.
Output Statistical Editing
For output statistical editing, scores are based on a combination of unit contributions to estimates, unit contributions to standard errors of estimates, and unit contributions to movements in the estimates. In output editing we need to focus our attention on actual weighted contributions of units rather than on predictions of change in estimates that could be expected due to editing (as is the case with significance input editing). Output editing involves a combination of detecting outliers, detecting remaining significant reporting errors, and analysing the trends in estimates (such as movements for continuing surveys). The output editing scores will prioritise those responses which most assist with the dual objectives of controlling estimate quality and understanding the trends in the estimates.
Three separate initial output scores are created for a specific item based on contributions to the estimate, the movement, and the standard error. These scores are then combined into a single item score which can be interpreted as an average score representing a provider's overall importance to the item. Item scores can be further combined to generate provider scores.
Both input and output significance scores can be calculated for selected items (e.g. turnover, wages) and for each provider. They have the advantage of using only a minimal amount of auxiliary information. For example, they use information on the current unit (including historical if available) and a small store of common information (such as the current and previous estimates and design weights). The scores are based on simple statistical techniques and can be constructed from output from generalised tools. The scores have a similar form regardless of the complexities of the estimation system and are consistent with the type of estimates being produced.
Conclusions
Significance editing has already been demonstrated in a number of studies in the ABS to be a cost effective way to manage editing resources and output quality. However, studying the effect on each survey prior to implementation may prove to be costly. The significance editing framework outlined above allows existing methods to be mapped against it and a basic significance editing system developed. It it is our belief that this framework could be used for most surveys.
For more information, please contact Paul Sutcliffe on (02) 6252 6759, or Keith Farwell on (03) 6222 5889.
E-mail: p.sutcliffe@abs.gov.au, keith.farwell@abs.gov.au.