1504.0 - Methodological News, Sep 2010  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 29/09/2010   
   Page tools: Print Print Page Print all pages in this productPrint All

Research into Editing of Categorical Data in Business Surveys

There is increasing interest in collecting characteristics data along with quantitative data in business surveys in the Australian Bureau of Statistics (ABS).

For example, the ABS runs the Business Characteristics Survey which collects mostly characteristics data. Characteristics data is categorical in nature and is usually collected using tick-boxes. Whether a business accepts orders via the Internet is an example of a categorical or tick-box data item while the value of total earnings is an example of a quantitative data item. Values of 1 or 0 are used to indicate whether a box was ticked or not.

Efficient micro-editing of tick-box data cannot be achieved with significance editing. For quantitative data, the large range in values of the errors from responding businesses means a small proportion of the businesses tend to be responsible for most of the error generated in the statistics. It is desirable to concentrate most editing effort on these significant business responses. This situation does not occur for tick-box values because they can only be 0 or 1. Any error in a tick-box response is roughly as significant as an error in the next. It is not possible to obtain large gains in accuracy by editing only a small proportion of tick-box responses.

Also, due to the nature of tick-box questions, there tends to be too many tick-box edit failures. Those that cannot be corrected manually must be either corrected automatically (by using automatic editing techniques) or left as is. Automatic editing does not involve human intervention. The technique involves using algorithms to find the least number of data values (from those that failed the edits) which, when corrected, allow the complete set of failed data to pass the edits. The data items requiring correction are replaced by imputed values. Therefore, a tool for editing tick-box data must be able to select a minimum set of failed data requiring correction and create the imputed values required. The ABS currently does not have such a tool and current practice is to leave many failed tick-box responses uncorrected.

To address the problem of the cost of editing categorical data, the Statistical Services Branch is conducting research into the editing of categorical data (with a focus on business surveys). We recently completed a review of current methods and tools available for editing categorical data. The review concluded that the methods inspired by Fellegi and Holt's paper, "A systematic Approach to Automatic Edit and Imputation", published in the Journal of the American Statistical Association in 1976 are the best solutions currently available. The report provided a list of tools used by overseas agencies for editing categorical data which were considered suitable for further assessment to determine if any could be useful for the ABS. We are now commencing the assessment. In particular, we are interested in some systems built by Statistics Netherlands, Statistics Canada, and the U.S. Bureau of Census. We envisage that this preliminary assessment will provide guidance for further work in this field planned for the next year. For example, even if a suitable tick-box editing tool is found, there is still the need to find how to use it with significance editing (since many business surveys will have a mixture of categorical and quantitative data requiring micro-editing).

For further information contact either Keith Farwell on (03) 6222 5889 or keith.farwell@abs.gov.au, or Kin Chung on (08) 9360 5286 or kin.chung@abs.gov.au.