1504.0 - Methodological News, Mar 2001  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 16/07/2001   
   Page tools: Print Print Page Print all pages in this productPrint All

SYNTHETIC ESTIMATES FOR INPUT-OUTPUT

The Economic Activity Survey (EAS) collects a number of different categories of expenditure data (amongst other economic data) from businesses, one of which is the remainder category of other operating expenses. This is further disaggregated into approximately 25 subcategories in the EAS selected expense supplement form, which is sent out to a subsample of businesses from the main EAS sample. This second phase survey is also known as the Input-Output (I-O) Survey, as the data collected from it are used to compile I-O tables for the National Accounts.

In 1997-98 there were insufficient resources to collect I-O Survey data from all businesses, and the sample was restricted to around 1000 businesses that either had 200 or more employees or that were complex in structure and relatively large. The problem which faces us is this: how do we produce estimates for the remaining medium-sized and small businesses?

Up until now, the methodology has been to calculate the proportions of other operating expenses falling into each of the I-O subcategories from the most recent year where sample data is available (i.e. 1996-97 in this case), and apply them to the other operating expenses in the current year (i.e. 1997-98 in this example). An obvious shortcoming of this approach is that it does not allow for any changes in the pattern of proportions which may have occurred since the sample data used to calculate the proportions were collected.

We are currently working on a more sophisticated approach in which we model a logit function of the proportions, using data items collected in the main EAS sample as auxiliary (predictor) variables. In other words, we are imposing a multinomial logistic model and assuming that our I-O subcategories are conditionally independent given the EAS data items. Note however that our inferences based on our synthetic estimates do not depend on our model. An important requirement of the methodology we chose is that we be able to produce standard errors on the resulting estimates as an indicator of the sampling variability involved.

We have two main options about which data to use to produce synthetic estimates for 1997-98. We could use the data from the medium-sized and small businesses in the I-O sample in 1996-97 and then apply that relationship to the same sized businesses in 1997-98, thereby assuming that the relationship from 1996-97 continues to hold in 1997-98. Alternatively, we could assuming that the relationship between the I-O subcategories and the EAS auxiliary variables is the same for the large businesses from 1997-98 as it is for the medium-sized and small businesses in the same year. We are currently pursuing the latter approach, partly because of the relatively small sample in 1996-97 and partly because it will be easier to calculate standard errors. Our resulting estimates will be synthetic, made up of the sum of purely design-based estimates from the I-O sample of large businesses and synthetic estimates from the EAS sample of smaller businesses.

Our initial evaluation of this methodology using 1996-97 data shows mixed results. For some industries, our synthetic estimates accord well with design-based estimates; for other industries, less so. However, it is difficult to draw any firm conclusions, as the design-based estimates themselves are based on a relatively small sample and are subject to large standard errors.

Our timetable is to complete the synthetic estimates for 1997-98 by mid-March, and repeat the process for 1998-99 by early April. This will be followed by the calculation of standard errors.

In the future, we intend to return our attention to the method of obtaining synthetic estimates using data from a previous year. This will be more attractive in the future, as from 1998-99 on, the Input-Output Survey has followed a 'rolling industry' approach, whereby a sizeable sample across all business sizes is taken for a subset of industries in any particular year. In the next year, a different subset of industries is sampled in the same way, and so on. We may also look at whether we can improve our purely design-based estimates even where we have a sample across all sizes of businesses by using auxiliary information from EAS. Our estimation in this case would follow a two-phase logistic generalised regression technique.

For more information, please contact Paul Schubert on (02) 6252 5140.

E-mail: paul.schubert@abs.gov.au