6527.0 - Household Expenditure Survey, Australia: User Guide, 1998-99

ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 15/11/2001

Page tools: Print

Print Page Print all pages in this product

Contents >> Additional Information >> Appendix 1. Sampling variability

INTRODUCTION

The HES estimates are based on a sample of possible observations. Hence, they are subject to sampling variability and estimates may differ from the figures that would have been produced if information had been collected for all households.

A measure of sampling variability, and the extent to which an estimate may vary from the true figure, is the standard error (SE). The standard error measures the likely difference between an estimate based on a sample and a true estimate that would have been derived had all the population households been surveyed.

There are two major factors which influence a standard error.

Sample size - The larger the sample size, the more accurate the estimate and the smaller the standard error. Thus we expect more accurate estimates at the Australia level than at state level due to the larger sample size involved.
Variability of item values between households - If the reported values for all households are similar, then the likely difference between the estimate based on a sample and the true figure is small and this is reflected by a small standard error. For example, the standard error for weekly expenditure on bread is very low relative to the estimated expenditure, because most households have reported expenditures of a similar value. Estimates of average expenditure on bread produced from the HES are therefore considered to be very reliable. Standard errors for the purchase of motor cycles are, however, quite high relative to average expenditure, reflecting the fact that despite the longer recall period, households reported highly variable values for expenditure on motor cycles (many reported no expenditure, while a small number reported high amounts). HES estimates of motor cycle expenditure are therefore less reliable and so are subject to higher relative standard error.

There are about 2 chances in 3 that a sample estimate will differ by less than one standard error from the figure that would have been obtained if all households had been surveyed, and about 19 chances in 20 that the difference will be less than two standard errors.

The relative standard error (RSE) is the standard error expressed as a percentage of the estimate. Only estimates with relative standard errors of 25% or less are considered sufficiently reliable for most purposes. However, estimates with higher relative standard errors are included in some HES publications, because they are the best estimates available. In HES publications, estimates with an RSE of 25% to 50% are preceded by an asterisk (e.g. *3.4) and those with an RSE of more than 50% are preceded by a double asterisk (e.g. **6.1) to indicate that they should be used with caution.

NON-SAMPLING ERROR

The imprecision due to sampling variability, which is measured by the standard error, should not be confused with inaccuracies that may occur because of imperfect reporting by respondents, errors made in collection such as in recording and coding data, and errors made in processing the data. Inaccuracies of this kind are referred to as non-sampling error, and they may occur in any enumeration, whether it be a full count or a sample. It is not possible to quantify non-sampling error, but every effort is made to reduce it to a minimum. This is done by careful design of questionnaires, intensive training and supervision of interviewers, and efficient operating procedures.

CALCULATING RELATIVE STANDARD ERRORS

The ABS has calculated the relative standard errors for a variety of the HES estimates, using a technique known as Jacknife. Regression models were then fitted to the relative standard errors that had been calculated using the Jacknife technique, to smooth the results, and to summarise them into a form which is concise enough to publish. The outcome of this work is published in each HES publication, where data are provided to enable relative standard errors to be calculated for each estimate shown in the publication.

Table A3.1 (in appendix 3) shows the relative standard error for each expenditure item, at the Australia level. Table A1.1 on the next page shows the relative standard error for each household characteristic, at the Australia level.

To obtain the relative standard error for an estimate at any other level (e.g. for a state, or for an income quintile) the value in table A1.1 or table A3.1 as appropriate, must be adjusted to take account of the smaller size of the sample contributing to that particular estimate. Because the sample size is smaller, the relative standard error will be larger. The first step in making this adjustment is to look up the number of sampled households contributing to the estimate for the item: the 'Number of households in sample' from a particular state, or income quintile, will be shown in the table which contains the estimate of interest.

The relative standard error for an estimate can be calculated by multiplying the relative standard error for the item at the Australia level (found directly from table A1.1 or A3.1), by an adjustment factor (found from graph A1.2) which compensates for the smaller sample size.

In theory, each different item requires a different adjustment factor. However, to prevent graph A1.2 from becoming illegible, the items have been formed into six groups (labelled A-F). Within each group of items, the theoretical adjustment factors are similar enough that a common adjustment factor can be used in practice. Table A1.1 indicates the group to which each household characteristic belongs. Table A3.1 indicates the group to which each expenditure item belongs.

A1.1 RELATIVE STANDARD ERRORS OF HOUSEHOLD CHARACTERISTICS

		Relative Standard Error (%) for Australia	Factor line	Sample size where RSE = 25%
Average weekly household income ($)
	Not categorised by quintile	0.9	F	9
	First quintile	0.5	F	11
	Second quintile	1.0	F	1
	Third quintile	0.2	F	1
	Fourth quintile	0.2	E	1
	Fifth quintile	1.1	F	3

Source of income (% of total income)
	Employee income	1.0	F	12
	Own business income	6.3	E	325
	Government pensions and allowances	2.5	E	42
	Other	4.2	F	191
	Total	1.4	F	21

Average age of reference person		0.4	F	2
Average number of employed persons in household (a)		1.0	F	11

Average number of persons in the household
	Under 18 years	1.9	F	41
	18 to 64 years	0.7	F	5
	65 years and over (a)	3.2	F	115

Tenure type (% of households)
	Owners without a mortgage	1.9	F	40
	Owners with a mortgage	2.2	F	52
	Renters from state or territory housing authority	8.9	D	529
	Renters - other	2.5	F	72
	Other	10.3	E	962

Household composition (% of households)
	Couple, one family
	- Couple only	2.2	F	42
	- Couple with dependent children only	2.2	F	40
	- Other couple, one family households	4.1	F	140
	One parent, one family with dependent children	5.0	F	208
	Other family households	7.3	F	450
	Lone person	1.9	F	32
	Group	8.1	E	433

Estimated number in population (’000)
	Households
	- Capital city (a)	5.5	D	154
	- Other urban	10.7	C	614
	- Rural	14.7	C	1,516
	- Total households (a)	3.4	F	129
	Persons (a)	4.5	E	154

(a) This estimate for Australia is a benchmark total. RSEs for benchmark values should not be referenced from this publication. See paragraphs under heading of Standard Errors for Benchmark Totals for more details.

Graph A1.2 plots the adjustment factor for each of these 6 groups (A-F) of items, against sample size. The adjustment factor for a particular estimate can be read off this graph, once the sample size contributing to the estimate and the group to which the item belongs have been determined. In brief, the procedure for calculating the relative standard error for a particular estimate is as follows:

from tables containing estimates in the relevant publication, look up the number of sampled households contributing to the estimate for the item;
using table A1.1 or A3.1, look up the Australian relative standard error, R, for that item and the letter of the factor line corresponding to the item;
using the factor line graph, read off the value of the factor, FCT, for the number of sampled households for the particular item;
the relative standard error is calculated using the following equation:

RSE = FCT * R%
where
R = the relative standard error of the estimate for Australia and is given in table A1.1 or A3.1; and
FCT = a factor based on the number of sampled households and is given in graph A1.2.

An example of the calculation of a relative standard error is given below. Table 1 of the 1998-99 HES publication Summary of Results (Cat. no. 6530.0) shows that the estimate of average household expenditure on transport for the fourth income quintile group is $154.80. The relative standard error on this group is calculated as follows.

From table 1 the number of sampled households is 1,477.
From table A3.1 the Australian RSE is 2.3% and the factor line required is E.
Looking up line E on graph A1.2, when the number of sampled households is 1,477 the factor E is approximately 2.0.
The RSE is thus: 2.0 * 2.3% = 4.6%.

The estimate of average weekly expenditure for transport at the fourth quintile income level is $154.80. Therefore the SE for this fourth quintile estimate is RSE * estimate = 0.046 * $154.80 = $7.12. From here we can deduce that there are about 2 chances in 3 that the true value lies within $7.12 of the estimate (or between $147.68 and $161.92) and 19 chances in 20 that it lies within $14.24 of the estimate (or between $140.56 and $169.04).

STANDARD ERRORS FOR BENCHMARK TOTALS

As outlined in chapter 4, estimates derived from the survey were obtained using a complex regression estimation procedure which ensures that survey estimates conform to independently estimated distributions of the population, also called benchmark totals.

The relative standard error of benchmark totals, and benchmark totals by quintile, should not be referenced from this publication. (All benchmark totals are footnoted "a" in table A1.1.) An indication of the quality of some household benchmark totals may be found in Household Estimates 1986, 1991-94 (Cat. no. 3229.0). Person benchmark totals are not subject to sampling error, but are subject to non-sampling error.

The Australia-level relative standard errors of benchmark values are provided only as a means of calculating non-benchmark total estimates. For example, the average number of people aged 65 years and over in a household is a benchmark total, so its Australian RSE should not be referenced from this publication; its Australian RSE in table A1.1 should only be used to calculate the RSE of non-benchmark estimates, such as the average number of people aged 65 years and over living in a couple only household.

CALCULATION OF STANDARD ERRORS FOR DERIVED STATISTICS

Many figures of interest may be derived by taking sums, differences and ratios of the tabulated data.

Approximate standard errors for these ‘derived estimates’ can be calculated using the formulae below in which x₁ and x₂ are estimates and SE(x₁) and SE(x₂) are the standard errors of x₁ and x₂. Exact standard errors for these ‘derived estimates’ have not been published, although they could be calculated upon request.

Note: The approximate formulae are derived assuming the correlation between x₁ and x₂ is zero. Correlation, in this context, is a statistical estimate which measures the linear relationship between x₁ and x₂ and takes values in the range [-1,1]. The correlation will be exactly zero if the two estimates are based on independent subgroups of the sample (e.g. different states or income groups). Two estimates of the same subgroup will be positively correlated if large values of the items are likely to occur together (e.g. estimates of expenditure on transport are likely to be correlated with estimates of expenditure on purchase of vehicles because purchase of vehicles is a large part of the expenditure included in expenditure on transport).

Converting between relative standard error (RSE) and standard error (SE)

The relative standard error is the standard error expressed as a percentage of the estimate. Formulae for converting standard errors to relative standard errors and the relative standard errors to standard errors are:

Returning to the expenditure on transport example, average expenditure on transport (x₁) at the fourth income quintile level was $154.80 and the RSE was equal to 4.6%. Therefore, the standard error (SE(x₁)) was equal to ($154.80 * 4.6) / 100 = $7.12.

Calculating the standard error for summed estimates

New items or categories of expenditure can be derived by combining existing ones. The approximate standard error of the estimate is:

For example, if we wanted to create a new category of expenditure, say of expenditure on transport and personal care, then to calculate the standard error of the new category we would need to know the standard error of expenditure on both transport and personal care. At the Australia level, the estimate for expenditure on transport ($117.82) and personal care ($13.73) can be obtained from table 1 of the 1998-99 HES publication Summary of Results (Cat. no. 6530.0). Calculation of the standard error for the combined estimate of transport and personal care would be as follows:

Note that if there was a non-zero correlation between x₁ and x₂ then the standard error for a sum would be:

where r is the sample correlation coefficient.

Thus, if the two estimates are positively correlated (i.e. r > 0) then the standard error will be underestimated; similarly if there is a negative correlation (i.e. r < 0) then the standard error will be overestimated.

Calculating the standard error for the difference between estimates

The standard error of the difference can be used to determine whether two estimates are significantly different, that is, whether the difference is unlikely to be due to sampling variability. If the difference between estimates is twice the standard error of the difference, then the estimates are said to be statistically different at the 95% confidence level.

The approximate standard error of the difference between estimates is:

As can be seen, the approximate standard error of the difference involves the same calculations as the standard error of the sum. This approximation is accurate provided that the two estimates have zero correlation. If correlation exists then we obtain the standard error formula of

In this case a positive correlation will produce an overestimate of standard error whilst a negative correlation will produce an underestimate.

Calculating the standard error of the ratio of estimates

Two items can be compared by calculating the ratio of one to the other.

For example, researchers may want to express expenditure on petrol (expenditure code 10010301) as a percentage of total expenditure on transport costs (the sum of all expenditure codes beginning with 10).

The relative standard error of the percentage or proportion can be approximated using the formula:

As can be seen, this formula is similar to that used for calculating sums and differences between estimates, except that relative standard errors are used in the formula in place of the standard errors.