Page tools: Print Page Print All | |||||||
DATA QUALITY
One measure of the likely difference is given by the standard error estimate (SE). There are about two chances in three (67%) that a sample estimate will differ by less than one SE from the population parameter that would have been obtained if all persons had been surveyed, and about 19 chances in 20 (the 95% confidence level) that the difference will be less than two SEs. Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate. RSEs for all estimates in Personal Safety, Australia, 2012 (cat. no. 4906.0) have been calculated using the Jack-knife method of variance estimation. This involves the calculation of 60 'replicate' estimates based on 60 different sub-samples of the obtained sample. The variability of estimates obtained from these sub-samples is used to estimate the sample variability surrounding the estimate. Indications of the level of sampling error in this survey are measured by Relative Standard Errors (RSEs). Very small estimates may be subject to such high relative standard errors as to detract seriously from their value for most reasonable purposes. Only estimates with relative standard errors less than 25% are considered sufficiently reliable for most purposes. However, estimates with relative standard errors of 25% or more are included in ABS publications of results from this survey: estimates with an RSE of 25% to 50% are preceded by the symbol * as a caution to indicate that they are subject to high relative standard errors, while estimates with an RSE greater than 50% are preceded by the symbol ** to indicate the estimate is too unreliable for general use. Calculation of Standard Error Standard errors can be calculated using the estimates (counts or percentages) and the corresponding RSEs. For example in Personal Safety, Australia, 2012 (cat. no. 4906.0), the estimated number of women who experienced violence in the last 12 months was 467,300. The RSE corresponding to this estimate is 5.9%. The SE is calculated by: = (467,300*5.9) / 100 = 27,600 (rounded to the nearest 100) Confidence Interval There are about two chances in three that the sample estimate will differ by less than one SE from the population parameter that would have been obtained if all dwellings had been included in the survey, and about 19 chances in 20 that the difference will be less than two SEs. This example is illustrated in the diagram below. Proportions and Percentages Proportions and percentages, which are formed from the ratio of two estimates, are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. For proportions where the denominator is an estimate of the number of persons in a grouping and the numerator is the number of persons in a sub-group of the denominator group, the formula for an approximate RSE is given by: Differences The difference between two survey estimates (counts or percentages) is also subject to sampling error. An approximate SE of the difference between two estimates (x-y) may be calculated by the formula: This approximation can generally be used whenever the estimates come from different samples, such as two estimates from different years, or two estimates for two non-intersecting subpopulations in the one year. If the estimates come from two populations, one of which is a subpopulation of the other, the standard error is likely to be lower than that derived from this approximation. SIGNIFICANCE TESTING ON DIFFERENCES BETWEEN SURVEY ESTIMATES For comparing estimates between surveys or between populations within a survey it is useful to determine whether apparent differences are 'real' differences between the corresponding population characteristics or simply the product of differences between the survey samples. One way to examine this is to determine whether the difference between the estimates is statistically significant. A statistical significance test for a comparison between estimates can be performed to determine whether it is likely that there is a difference between the corresponding population characteristics. The standard error of the difference between two corresponding estimates (x and y) can be calculated using the formula shown above in the Differences section. This standard error is then used to calculate the test statistic: If the value of this test statistic is greater than 1.96 then there is good evidence, with a 95% level of confidence, of a statistically significant difference in the two populations with respect to that characteristic. Otherwise, it cannot be stated with confidence (at the 95% confidence level) that there is a real difference between the populations.
NON SAMPLING ERROR
Errors related to survey scope Some dwellings may have been inadvertently included or excluded due to inaccuracies in the lists of dwellings in the selected areas. In addition, some people may have been inadvertently included or excluded, due to difficulties in applying the scope rules for household visitors or people over 18 years. However, since the ABS has gained considerable experience in these procedures over many years, any resultant errors are considered to be minimal. Response errors Response errors may arise through ambiguous or misleading questions, or inadequate or inconsistent definitions of terms used. Thorough testing of the CAI instrument aimed to minimise problems associated with sequencing of questions, content and order of the questions. During testing, particular attention was given to the wording of questions and respondent interpretation of them, as well as to the interviewer instructions, to ensure that information collected fitted within the relevant definitions. While the questionnaire has been improved and streamlined through testing, the type and amount of data required from the survey resulted in a complex questionnaire. In some cases, such as when a person had experienced incidents of violence by a number of different perpetrators, errors may have resulted from the interviewer and/or the respondent's confusion. In any survey, inaccurate reporting may occur due to respondents misunderstanding the questions or answering incorrectly to protect their personal integrity, their personal safety or to protect somebody else. For example, some people may not have reported incidents they experienced, particularly if the perpetrator was somebody close to them, such as a partner or family member. However, conducting the interviews alone with people was a procedure used to minimise this effect. Errors related to recall Recall errors may arise in a number of ways. People may forget to report incidents that occurred in the past, or they may report incidents as occurring in a more recent time period. Recall errors are likely to be greater for information collected about incidents that occurred a long time ago. When collecting information about the characteristics of a person's most recent incident of the 8 types of violence, detailed information about the most recent incident was not collected when the violence occurred more than 20 years ago due to the possibility of recall errors. Non-response bias Processing Errors Opportunities exist for errors to arise during the processing of data. Again, due to long-standing and proven data processing practices the ABS does not believe that there are any significant processing errors in the data. Errors may also occur when computer editing programs fail to detect errors; and when data is coded and transferred at various stages of computer processing. In order to minimise these errors, computer edits were devised to ensure that logical sequences were followed in the CAI instrument, that necessary items were present and that specific values lay between certain ranges. These edits were designed to detect reporting or recording errors, and incorrect relationships between data items or missing data items. Validation was conducted on the data file at various stages during processing (such as, after computer editing and subsequent amendments, weighting of the file and after derivation of new data items) to help identify possible errors.
|