4363.0.55.001 - National Health Survey: Users' Guide, 2001  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 27/05/2003   
   Page tools: Print Print Page Print all pages in this productPrint All  
Contents >> Appendix 12 - Standard errors

RELIABILITY OF ESTIMATES


Measuring sampling variability

Since the estimates from this survey are based on information obtained from a sub-sample of usual residents of a sample of dwellings, they are subject to sampling variability; that is, they may differ from those that would have been produced if all usual residents of all dwellings had been included in the survey. One measure of the likely difference is given by the standard error (SE), which indicates the extent to which an estimate might have varied by chance because only a sample of dwellings was included.

There are about two chances in three that a sample estimate will differ by less than one SE from the number that would have been obtained if all dwellings had been included, and about 19 chances in 20 that the difference will be less than two SEs. Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate.


INDICATIVE STANDARD ERRORS

Because of the large number and diverse nature of estimates which it is possible to produce from the NHS and NHSI it is not practicable to present separate indication of the SEs of all estimates. Indicative standard errors, and relative standard errors on estimates from the NHS and NHSI are provided in Tables 1 to 3 below. Figures in these table do not give a precise measure of the SE for a particular estimate but will provide an indication of its magnitude. ABS has modelled these SEs on the full survey design information. Exact RSEs on every estimate can however be provided by the replicate weight methodology. This methodology is described at the end of this Appendix.

An example of the calculation and the use of SEs from Table 1 in relation to estimates of persons is as follows. Consider the estimate for Australia of persons aged 45 - 54 years who reported high cholesterol as a long-term condition (246,300). Since this estimate is between 200,000 and 300,000 in the SE table, the SE will be between 13,200 and 15,600 and can be approximated by linear interpolation as 14,300 (rounded to the nearest 100). Therefore, there are about two chances in three that the value that would have been produced if all dwellings had been included in the survey will fall in the range 232,000 to 260,600 and about 19 chances in 20 that the value will fall within the range 217,700 to 274,900.

As can be seen from the SE table the smaller the estimate the higher the RSE. Very small estimates are thus subject to such high SEs (relative to the size of the estimate) as to detract seriously from their value for most reasonable uses. Only estimates with RSEs of less than 25% and percentages based on such estimates are considered sufficiently reliable for most purposes. However estimates with a higher RSE are contained in published tables from the survey and can be provided on request. In published output estimates with an RSE of 25% to 50% are preceded by an asterisk (e.g. *3.4) to indicate that they are subject to high SEs and should be used with caution. Estimates with RSEs greater than 50% are preceded by a double asterisk (e.g. **2.1) to indicate that they are considered too unreliable for general use.


SEs of proportions and percentages

Proportions and percentages formed from the ratio of two estimates are also subject to sampling errors. The size of the error depends of the accuracy of both the numerator and denominator. A formula to approximate the RSE of a proportion is given below:

RSE( x/y ) =sqrt[RSE(x)]2 - [RSE(y)]2

Note - this formula only holds when the x is a subset of y. It should not be used if this is not the case i.e. estimates of 'rates' as opposed to proportions.

Using this formula, the RSE of the estimated proportion or percentage will be lower than the RSE estimate of the numerator. Therefore an approximation for SEs of proportions or percentages may be derived by neglecting the RSE of the denominator i.e. obtaining the RSE of the number of persons corresponding to the numerator of the proportion or percentage and then applying this figure to the estimated proportion or percentage. This approach was adopted for the purposes of assigning the * or ** to indicate a 25% or 50% RSE threshold in publications from the NHS and NHSI.

SEs may also be used to calculate SEs for the difference between two survey estimates (numbers or percentages). The sampling error of the difference between the two estimates depends on their individual SEs and the relationship (correlation) between them. An approximate SE of the difference between two estimates (x-y) may be calculated by the following formula:
SE(x-y) =sqrt[SE(x)]2 +[SE(y)]2

While this formula will only be exact for differences between separate and uncorrelated characteristics of subpopulations, it is expected to provide a reasonable approximation for most differences likely to be of interest in relation to this survey.

The imprecision due to sampling variability, which is measured by the SE, should not be confused with inaccuracies that may occur because of imperfections in reporting by respondents and recording by interviewers, and errors made in coding and processing data. Inaccuracies of this kind are referred to as non-sampling error, and they may occur in any enumeration, whether it be a full count or a sample. Every effort is made to reduce non-sampling error to a minimum by careful design of questionnaires, intensive training and supervision of interviewers, and efficient operating procedures.

TABLE 1: (INDICATIVE) STANDARD ERRORS ON NHS PERSON ESTIMATES

Standard error (no)
Australia


Size of estimate
NSW
Vic
Qld
SA
WA
Tas
ACT
SE (no)
RSE (%)

500
520
488
499
404
438
342
268
468
93.7
1,000
848
782
777
647
686
526
397
750
75.0
1,500
1,113
1,019
997
839
880
666
492
978
65.2
2,000
1,342
1,222
1,184
1,002
1,046
780
570
1,174
58.7
2,500
1,548
1,403
1,350
1,145
1,190
880
635
1,350
54.0
3,400
1,734
1,566
1,500
1,272
1,320
969
693
1,512
50.4
3,500
1,904
1,718
1,638
1,390
1,439
1,047
742
1,659
47.4
4,000
2,064
1,860
1,764
1,496
1,548
1,120
788
1,800
45.0
4,500
2,219
1,989
1,881
1,598
1,652
1,184
832
1,930
42.9
5,000
2,360
2,115
1,995
1,690
1,745
1,245
870
2,055
41.1
6,000
2,622
2,346
2,202
1,866
1,920
1,362
942
2,286
38.1
8,000
3,088
2,752
2,568
2,160
2,232
1,552
1,056
2,696
33.7
10,000
3,500
3,100
2,880
2,420
2,490
1,710
1,160
3,060
30.6
20,000
5,040
4,440
4,060
3,340
3,460
2,260
1,480
4,440
22.2
30,000
6,180
5,400
4,920
3,960
4,140
2,610
1,680
5,490
18.3
40,000
7,080
6,160
5,600
4,440
4,680
2,880
1,840
6,320
15.8
50,000
7,850
6,800
6,200
4,850
5,100
3,100
1,950
7,050
14.1
100,000
10,600
9,100
8,300
6,200
6,600
3,800
2,300
9,700
9.7
200,000
13,800
12,000
10,800
7,600
8,400
4,400
3,000
13,200
6.6
300,000
16,200
13,800
12,600
8,400
9,600
4,800
2,800
15,600
5.2
400,000
17,600
15,200
14,000
8,800
10,400
5,200
17,600
4.4
500,000
19,000
16,500
15,000
9,500
11,000
19,000
3.8
1,000,000
23,000
20,000
19,000
11,000
13,000
24,000
2.4
2,000,000
28,000
24,000
22,000
30,000
1.5
5,000,000
35,000
40,000
0.8
10,000,000
50,000
0.5
20,000,000
60,000
0.3


TABLE 2: NHS ESTIMATES WITH AN (INDICATIVE) RSE OF 25% AND 50%

Size of estimate
NSW
Vic
Qld
SA
WA
Tas
ACT
Aust

RSE of 25%
20353
15693
13348
9352
9940
4978
2577
15563
RSE of 50%
4337
3343
2996
2009
2224
1131
588
3059


TABLE 3: (INDICATIVE) STANDARD ERRORS ON INDIGENOUS PERSON ESTIMATES, AUSTRALIA

Size of estimate
Standard Error
Relative Standard Error

no.
%
500
270
54.3
600
310
51.2
700
340
48.6
800
370
46.4
900
400
44.5
1,000
430
42.8
1,100
450
41.3
1,200
480
40.0
1,300
500
38.8
1,400
530
37.7
1,500
550
36.7
1,600
570
35.8
1,700
590
34.9
1,800
610
34.1
1,900
630
33.4
2,000
650
32.7
2,100
670
32.0
2,200
690
31.4
2,300
710
30.8
2,400
730
30.3
2,500
740
29.8
3,000
830
27.5
3,500
900
25.7
4,000
970
24.2
4,500
1,030
22.9
5,000
1,090
21.8
6,000
1,200
20.0
7,000
1,300
18.6
8,000
1,390
17.4
9,000
1,470
16.4
10,000
1,550
15.5
20,000
2,130
10.7
30,000
2,540
8.5
40,000
2,850
7.1
50,000
3,110
6.2
100,000
3,980
4.0
200,000
4,940
2.5
300,000
5,520
1.8
400,000
5,940
1.5


NOTE:
Because the age distribution of the Indigenous population differs from that of the non-Indigenous population, data are often age standardised for the purposes of making comparisons between the Indigenous and non-Indigenous populations. Age standardised estimates are also often used for comparisons over time. Where Indigenous estimates from the 2001 collection have been age standardised, the standard errors are, on average, between 10% and 30% higher than the corresponding standard error of unstandardised estimates. Therefore, an adjustment factor of approximately 1.2 should be applied to the RSEs shown above for all age standardised estimates for the Indigenous population.


REPLICATE WEIGHTS TECHNIQUE

A class of techniques called replication methods provide a general method of estimating variances for the types of complex sample designs and weighting procedures employed in ABS household surveys.

A basic idea behind the replication approach is to select subsamples repeatedly from the whole sample. For each of these subsamples the statistic of interest is calculated. The variance of the full sample statistics is then estimated using the variability among the replicate statistics calculated from these subsamples. The subsamples are called replicate groups and the statistics calculated from these replicates are called replicate estimates.

There are various ways of creating replicate subsamples from the full sample. The replicate weights produced for the 2001 NHS have been created under the Jackknife method of replication which is described below.

There are numerous advantages to using the replicate weighting approach. These include;
  • the same procedure is applicable to most statistics such as means, percentages, ratios, correlations, derived statistics and regression coefficients
  • it is not necessary for the analyst to have available detailed survey design information if the replicate weights are included with the data file.

Derivation of replicate weights

Under the Jackknife method of replicate weighting, weights were derived as follows:
  • 30 replicate groups were formed with each group formed to mirror the overall sample. Units from a CD all belong to the same replicate group and a unit can belong to only one replicate group.
  • one replicate group was dropped from the file and then the remaining records were weighted in the same manner as for the full sample
  • The records in that group that was dropped received a weight of zero
  • This process was repeated for each replicate group (i.e. a total of 30 times)
  • Ultimately each record had 30 replicate weights attached to it with one of these being the zero weight.

Application of replicate weights

As noted above, replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit records analyses such as chi-square and logistic regression to be conducted which take into account the sample design.

Replicate weights for any variable of interest can be calculated from the 30 replicate groups, giving 30 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.

The formula for calculating the Standard error (SE) and relative standard error (RSE) of an estimate using this method is shown below.

SE(y) = sqrt ( (29/30) Sg (y(g) - y)2 )

where

g = 1,..,30 (the no. of replicate weights) ;
y(g) = estimate from using repwt g; and
y = estimate from using full person weight.

The RSE(y) = SE(y)/y * 100%.

This method can also be used when modelling relationships from unit record data, regardless of the modelling technique used. in modelling, the full sample would be used to estimate the parameter being studied, such as a regression co-efficient, the 30 replicate groups used to provide 30 replicate estimates of the survey parameter. The variance of the estimate of the parameter from the full sample is then approximated, as above, by the variability of the replicate estimates.


Use of replicate weights with statistical packages

Not all statistical computer packages may allow direct calculation of SEs using the Jacknife replicate weights. However, those packages that allow the direct use of Balanced Repeated Replication (BRR) methodology generally include the option of an adjustment factor. This factor can be incorporated to overcome the difference between the variance formulae.


Availability of RSEs calculated using replicate weights

Indicative RSEs were used in the summary publications released from the NHS and NHSI. However,
  • A set of NHS tables containing a breakdown by ASGC Remoteness categories is available as spreadsheets on the ABS web site, via the Health Theme Page. RSEs for these tables were calculated using the replicate weights methodology.
  • Tables from the publication National Health Survey: Aboriginal and Torres Strait Islander Results, Australia 2001 (cat. no. 4715.0) which contain age standardised estimates were also recompiled with RSEs calculated using the replicate weights methodology. These are available electronically and can be accessed through publication 4715.0 on the ABS web site.



Chapter 1 - Introduction

Chapter 2 - Survey Design and Operation

Chapter 3 - Health Status Indicators

Chapter 4 - Health Related Actions

Chapter 5 - Health Risk Factors

Chapter 6 - Population Characteristics

Chapter 7 - Data Quality and Interpretation of results

Chapter 8 - Data Output and Dissemination
Appendix 1 - Glossary of Terms Used

Appendix 2 - Sample Counts and Weighted Estimates

Appendix 3 - Classification of Long-term Medical Conditions: Based on ICD-10

Appendix 4 - Classification of Long-term Medical Conditions: Based on ICD-9

Appendix 5 - Classification of Long-term Medical Conditions: ICPC Based

Appendix 6 - Classification of Type of Medication

Appendix 7 - Classification of Country of Birth

Appendix 8 - Classification of Language Spoken at Home
Appendix 9 - Classification of Occupation

Appendix 10 - Classification of Industry of Employment

Appendix 11 - Classification of Types of Alcoholic Drinks

Appendix 12 - Standard Errors

Appendix 13 - Content of the 2001 National Health Survey (Indigenous)

Appendix 14 - List of Abbreviations




Previous PageNext Page