1504.0 - Methodological News, Sep 2012  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 26/09/2012   
   Page tools: Print Print Page Print all pages in this productPrint All

Simulation Cost Model for the Monthly Population Survey

Accurate modelling of the interviewer enumeration costs for collecting household survey data has been a significant challenge for many years. Traditionally a linear cost model has been used, consisting of costs proportional to the total number of clusters selected (groups of dwellings in the same general location) and costs proportional to the total number of dwellings selected. The main issue with this method is that costs associated with travel to clusters (time and kilometre allowance, which are major components of total costs) are essentially proportional to the number of visits made to a cluster, which is a complex function of the total number of dwellings or clusters selected and therefore cannot be accurately represented by the linear model for a wide range of cluster sizes.

In spite of this issue, the linear cost model method has been sufficient to estimate costs for the process of determining optimal cluster sizes and sampling fractions when no significant changes were occurring to the overall design. In the current redesign of the Monthly Population Survey (MPS) however, there were a few significant changes, including the upcoming introduction of web-based enumeration. Also, the linear cost model didn't translate easily to use for Special Social Surveys without recalculating all the parameters. Therefore an alternative more flexible cost modelling method was required.

It was decided to try using a simulation cost model, This is where all the activity of an interviewer required to complete a workload (e.g. travelling from home to the first dwelling, approaching a dwelling, interviewing etc.) is simulated and the total travel distance and time is counted, all for a given design, To inform the choice of sample design parameters a range of variance equivalent designs are compared to determine which one is the cheapest. In order to do this the following parameters had to be estimated:

  • The probability of making contact with a household (conditional on the number of approaches that had been made);
  • The probability of getting an interview at a contact (conditional on the number of previous contacts that had been made);
  • Factors to convert straight-line distance between clusters to kilometres travelled;
  • Time taken to travel between clusters (consisting of a constant time plus a time proportional to the straight-line distance);
  • Time and distance for travel between dwellings within a cluster (for a given area type);
  • Time taken for an interview;
  • The amount of time an interviewer had available for work in one block.

To develop the model, paradata on the MPS data collection was used, which consists of records of respondent call attempts made by interviewers and their travel. This paradata provides a reasonably rich data source that should allow for straightforward estimation of all the parameters listed above (apart from the last one). The primary purpose of the data is for administering interviewer pay, so some aspects of the quality requirements for the data are different between its primary purpose and constructing a simulation model. Therefore considerable work was required to amend the data to provide the consistency needed to undertake the required modelling.

The chosen method of validating the simulation model was to build the model using the listed parameters, then simulate the historical selections from which we had estimated the parameters, to see if the average number of visits to each cluster was the same in the simulation as it was in reality. This was a legitimate test as the number of block visits was not used in creating any of the parameters.

When the model was finally applied to the new design of the MPS, the optimal cluster sizes came out quite similar to those in past redesigns. It was also found that if a 20% take-up of web-based enumeration was assumed, there was no discernible change in optimal cluster size.

It is hoped that the model will be expanded to be able to inform decisions about the sample designs for a wider range of household surveys as well as inform costs of data collection operation for new collection scenarios.


Further Information
For more information, please contact Peter Byron (02 6252 6804, p.m.byron@abs.gov.au)