4906.0.55.003 - Personal Safety Survey, Australia: User Guide, 2012

ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 13/05/2014

Page tools: Print

Print Page Print all pages in this product

Contents >> Data Quality and Comparability >> Data Quality

DATA QUALITY

Although care was taken to ensure that the results of the 2012 PSS are as accurate as possible, there are certain factors which affect the reliability of the results to some extent and for which no adequate adjustments can be made. One such factor is known as sampling error. Other factors are collectively referred to as non-sampling errors. These factors, which are discussed below, should be kept in mind in interpreting results of the survey.

SAMPLING ERROR
As the estimates in this publication are based on information obtained from a sample of persons, they are subject to sampling error (or sampling variability). That is, the estimates may differ from those that would have been produced had all persons been included in the survey.

The magnitude of the sampling error associated with a sample estimate depends on the following factors:

Sample design - there are many different methods which could have been used to obtain a sample from which to collect data on incidence of violence. The final design attempted to make key survey results as accurate as possible within cost and operational constraints (for further details see Sample Design and Selection);
Sample size - the larger the sample on which the estimate is based, the smaller the associated sampling error; and
Population variability - the extent to which people differ on the particular characteristic being measured. This is referred to as the population variability for that characteristic. The smaller the population variability of a particular characteristic, the more likely it is that the population will be well represented by the sample, and therefore the smaller the sampling error. Conversely, the more variable the characteristic, the greater the sampling error.

Measure of sampling variability

Sampling error is a measure of the difference between published estimates, derived from a sample of persons, and the value that would have been produced if the total population (as defined for the scope of the survey) had been included in the survey.

One measure of the likely difference is given by the standard error estimate (SE). There are about two chances in three (67%) that a sample estimate will differ by less than one SE from the population parameter that would have been obtained if all persons had been surveyed, and about 19 chances in 20 (the 95% confidence level) that the difference will be less than two SEs.

Another measure of the likely difference is the relative standard error (RSE), which is obtained by expressing the SE as a percentage of the estimate.
RSE% = (SE/estimate) x 100

RSEs for all estimates in Personal Safety, Australia, 2012 (cat. no. 4906.0) have been calculated using the Jack-knife method of variance estimation. This involves the calculation of 60 'replicate' estimates based on 60 different sub-samples of the obtained sample. The variability of estimates obtained from these sub-samples is used to estimate the sample variability surrounding the estimate.

Indications of the level of sampling error in this survey are measured by Relative Standard Errors (RSEs).

Very small estimates may be subject to such high relative standard errors as to detract seriously from their value for most reasonable purposes. Only estimates with relative standard errors less than 25% are considered sufficiently reliable for most purposes. However, estimates with relative standard errors of 25% or more are included in ABS publications of results from this survey: estimates with an RSE of 25% to 50% are preceded by the symbol * as a caution to indicate that they are subject to high relative standard errors, while estimates with an RSE greater than 50% are preceded by the symbol ** to indicate the estimate is too unreliable for general use.

Calculation of Standard Error

Standard errors can be calculated using the estimates (counts or percentages) and the corresponding RSEs. For example in Personal Safety, Australia, 2012 (cat. no. 4906.0), the estimated number of women who experienced violence in the last 12 months was 467,300. The RSE corresponding to this estimate is 5.9%. The SE is calculated by:

= (467,300*5.9) / 100

= 27,600 (rounded to the nearest 100)

Confidence Interval

There are about two chances in three that the sample estimate will differ by less than one SE from the population parameter that would have been obtained if all dwellings had been included in the survey, and about 19 chances in 20 that the difference will be less than two SEs. This example is illustrated in the diagram below.
Diagram: visual representation of how confidence intervals are calculated as discussed in the above text

Diagram: visual representation of how confidence intervals are calculated as discussed in the above text

Proportions and Percentages

Proportions and percentages, which are formed from the ratio of two estimates, are also subject to sampling errors. The size of the error depends on the accuracy of both the numerator and the denominator. For proportions where the denominator is an estimate of the number of persons in a grouping and the numerator is the number of persons in a sub-group of the denominator group, the formula for an approximate RSE is given by:

RSE (x/y) = Square root of([RSE(x)]squared - [RSE (y)] squared)

Differences

The difference between two survey estimates (counts or percentages) is also subject to sampling error. An approximate SE of the difference between two estimates (x-y) may be calculated by the formula:

SE (x-y) = square root of ([SE (x)] squared + [SE (y)] squared)

This approximation can generally be used whenever the estimates come from different samples, such as two estimates from different years, or two estimates for two non-intersecting subpopulations in the one year. If the estimates come from two populations, one of which is a subpopulation of the other, the standard error is likely to be lower than that derived from this approximation.

SIGNIFICANCE TESTING ON DIFFERENCES BETWEEN SURVEY ESTIMATES

For comparing estimates between surveys or between populations within a survey it is useful to determine whether apparent differences are 'real' differences between the corresponding population characteristics or simply the product of differences between the survey samples. One way to examine this is to determine whether the difference between the estimates is statistically significant.

A statistical significance test for a comparison between estimates can be performed to determine whether it is likely that there is a difference between the corresponding population characteristics. The standard error of the difference between two corresponding estimates (x and y) can be calculated using the formula shown above in the Differences section. This standard error is then used to calculate the test statistic:

(x-y/SE(x-y))

If the value of this test statistic is greater than 1.96 then there is good evidence, with a 95% level of confidence, of a statistically significant difference in the two populations with respect to that characteristic. Otherwise, it cannot be stated with confidence (at the 95% confidence level) that there is a real difference between the populations.

Example of estimates where there was a statistically significant difference

An estimated 8.7% of all men aged 18 years or over and 5.3% of all women aged 18 years or over had experienced violence during the 12 months prior to the survey (refer Table 1 in Personal Safety, Australia, 2012 (cat. no. 4906.0)).

The estimate of 8.7% of men who had experienced violence in the 12 months prior to the survey has an RSE of 7.4%. There are 19 chances out of 20 that an estimate of between 7.4% and 10.0% of men would have been obtained if all dwellings had been included in the survey.
The estimate of 5.3% of women who had experienced violence in the 12 months prior to the survey has an RSE of 5.9%. There are 19 chances out of 20 that an estimate of between 4.7% and 5.9% women would have been obtained if all dwellings had been included in the survey.

Diagram: visual representation of the text above

The value of the test statistic (using the formula shown in paragraph 14) is 4.75

As the value of this test statistic, at 4.75, is greater than 1.96, this showed there was evidence, with a 95% level of confidence, of a statistically significant difference in the two estimates. By calculating the confidence interval for the proportion of men and women who experienced violence in the 12 months prior to the survey, it can be seen that the confidence intervals for estimates for men and women do not overlap (where the confidence intervals do not overlap there is always a statistically significant difference). Therefore there is evidence to suggest that men were more likely than women to have experienced violence in the 12 months prior to the survey.

NON SAMPLING ERROR

Non-sampling error may occur in any data collection, whether it is based on a sample or a full count such as a census. Non-sampling errors occur when survey processes work less effectively than intended. Sources of non-sampling error include non-response, errors in reporting by respondents or in recording by interviewers, and occasional errors in coding and processing data. Every effort is made to reduce non-sampling error by careful design and testing of questionnaires, training of interviewers, and extensive editing and quality control procedures at all stages of data processing.

Errors related to survey scope

Some dwellings may have been inadvertently included or excluded due to inaccuracies in the lists of dwellings in the selected areas. In addition, some people may have been inadvertently included or excluded, due to difficulties in applying the scope rules for household visitors or people over 18 years. However, since the ABS has gained considerable experience in these procedures over many years, any resultant errors are considered to be minimal.

Response errors

Response errors may arise through ambiguous or misleading questions, or inadequate or inconsistent definitions of terms used. Thorough testing of the CAI instrument aimed to minimise problems associated with sequencing of questions, content and order of the questions. During testing, particular attention was given to the wording of questions and respondent interpretation of them, as well as to the interviewer instructions, to ensure that information collected fitted within the relevant definitions.

While the questionnaire has been improved and streamlined through testing, the type and amount of data required from the survey resulted in a complex questionnaire. In some cases, such as when a person had experienced incidents of violence by a number of different perpetrators, errors may have resulted from the interviewer and/or the respondent's confusion.

In any survey, inaccurate reporting may occur due to respondents misunderstanding the questions or answering incorrectly to protect their personal integrity, their personal safety or to protect somebody else. For example, some people may not have reported incidents they experienced, particularly if the perpetrator was somebody close to them, such as a partner or family member. However, conducting the interviews alone with people was a procedure used to minimise this effect.

Errors related to recall

Recall errors may arise in a number of ways. People may forget to report incidents that occurred in the past, or they may report incidents as occurring in a more recent time period. Recall errors are likely to be greater for information collected about incidents that occurred a long time ago.

When collecting information about the characteristics of a person's most recent incident of the 8 types of violence, detailed information about the most recent incident was not collected when the violence occurred more than 20 years ago due to the possibility of recall errors.

Non-response bias

Non-response occurs when people cannot or will not cooperate, or cannot be contacted. Non-response can affect the reliability of results and can introduce a bias. The magnitude of any bias depends on the rate of non-response and the extent of the difference between the characteristics of those people who responded to the survey and those who did not.

The following methods were adopted to reduce the level and impact of non-response:

face-to-face interviews with respondents, conducted in a private setting;
the use of interviewers, where required, who could speak languages other than English (where the language spoken was able to be established);
follow-up of respondents if there was initially no response;
weighting to population benchmarks to reduce non-response bias; and
inclusion of an explicit non-response adjustment.

Through careful design and testing of the questionnaire, training of interviewers, and extensive editing and quality control procedures at all stages of data collection and processing, other non-sampling error has been minimised. However, the information recorded in the survey is essentially 'as reported' by respondents, and hence may differ from information available from other sources, or collected using a different methodology.

Processing Errors

Opportunities exist for errors to arise during the processing of data. Again, due to long-standing and proven data processing practices the ABS does not believe that there are any significant processing errors in the data. Errors may also occur when computer editing programs fail to detect errors; and when data is coded and transferred at various stages of computer processing. In order to minimise these errors, computer edits were devised to ensure that logical sequences were followed in the CAI instrument, that necessary items were present and that specific values lay between certain ranges. These edits were designed to detect reporting or recording errors, and incorrect relationships between data items or missing data items. Validation was conducted on the data file at various stages during processing (such as, after computer editing and subsequent amendments, weighting of the file and after derivation of new data items) to help identify possible errors.