2960.0 - Census Working Paper 93/1 - Sequencing Instructions, 1991  
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 01/02/1993   
   Page tools: Print Print Page Print all pages in this productPrint All

Census Working Paper 93/1


A STUDY OF THE PERFORMANCE OF
SEQUENCING INSTRUCTIONS
IN THE 1991 CENSUS



Population Census Evaluation
February 1993



CONTENTS

Introduction

Executive Summary

Sequencing instructions included on the 1991 Census form

Sequencing instructions error analysis (NSW data)

Commission error
1991 Census Preliminary commission error rates
1991 Census Preliminary commission error rates compared with test data
Instructions by location
Birthplace
Language
Educational institution
15 years or more
Qualification
Full/part-time job
Looked for work
Nature of occupancy
Rented dwellings
Owned (or mortgaged) dwelling
Monthly housing loan repayment
Persons temporarily absent
Summary

Labour force questions error analysis (Australian data)
Method
Results

Response behaviour as a function of age (according to Q22)
Response behaviour as a function of Q30 Full/part-time job
Effect on final data
Summary



Conclusion


Appendixes

Calculating commission error
Characteristics of respondents making errors (labour force error analysis)
Correct response permutations (labour force error analysis)
Treatment of multiple marks (labour force error analysis)



LIST OF TABLES

Table 1
        Sequencing instructions included in the 1991 Census form (Person questions)
Table 2
        Sequencing instructions included in the 1991 Census form (Dwelling questions)
Table 3
        Commission error rates by topic, Preliminary data, NSW
Table 4
        1991 Census NSW Preliminary, 1990 Dress Rehearsal and August 1989 Test commission error rates
Table 5
        Proportion of NSW records containing error - Q11 & Q12
Table 6
        Proportion of NSW records containing error - Q18 & Q19
Table 7
        Proportion of NSW records containing error - Q20 & Q21
Table 8
        Commission error rates for questions following Q22, NSW
Table 9
        Proportion of NSW records containing error - Q24 & Q28
Table 10
        Proportion of NSW records for persons aged 15 years or more containing error - Q30 & Q31
Table 11
        Commission error rates for Q32, Q33 and Q39, Persons aged 15 years or more, NSW
Table 12
        Commission error rates for questions about rented dwellings, NSW
Table 13
        Incorrect response permutations, Labour force questions, Australia
Table 14
        Predicted labour force status allocated to incorrect response permutations, Labour force questions, Australia
Table 15
        Incorrect response permutations with possible impact on final labour force data
Table 16
        Methods for calculating commission error, Labour force questions, NSW
Table 17
        Incorrect labour force response permutations by language response, Australia
Table 18
        Incorrect labour force response permutations by response to Q24 Qualification, Australia
Table 19
        Correct response permutations, Labour force questions, Australia
Table 20
        Correct response permutations by age of respondent, Labour force questions, Australia
Table 21
        Multiple marks according to classification of labour force response permutations
Table 22
        Multiple marks according to type of labour force response permutation

INTRODUCTION

Sequencing instructions are designed to reduce the burden of filling in the form by enabling respondents to skip irrelevant questions, and encouraging responses to questions for which an answer is required.

The aim of this paper is to assess the performance of each sequencing instruction, and to isolate particular instructions that have the greatest impact on data quality by measuring the type and frequency of errors occurring. This is the first comprehensive analysis of the performance of sequencing instructions and their effect on data quality in the Australian Census. It gives valuable insight into response behaviour in the complex sequence of labour force instructions.

Background

Sequencing instructions in the 1986 Census took the form of a banner placed across the top of the form or at the top of the relevant question. Despite the use of bold text and a coloured banner (pale brown), a report by the 1986 Data Transcription Centre indicated that the performance of sequencing instructions was an issue of concern.

The performance of various sequencing instruction methods was examined as part of the form design testing program for the 1991 Census. In each case their performance was assessed by analysing non-response rates (the proportion of persons not answering a question when they should) and commission error rates (the proportion of persons answering a question when they should not).

The banner instructions used in the 1986 Census were replaced by bold sequencing instructions placed in the response area. This decision was reached after testing indicated that response area sequencing instructions performed better in terms of lower non-response and commission error rates. (For more information see Report on 1991 Census of Population and Housing Form Design Testing, Census Working Paper 91/4.)

The modifications made to the sequencing instructions for the 1991 Census were extensively tested and improved and their performance was considered to be satisfactory. However, feedback after the census from Collector and Group Leader Debriefings, Central Office correspondence and the study conducted by Frank Small and Associates indicated once again concerns about the effectiveness of the sequencing instructions, despite the radical changes effected. (For more information, see Towards an Understanding of Designing Future Population Census Forms, Frank Small and Associates, November 1991.)

An investigation of the use of sequencing instructions in other countries by the Development team showed that a different format was used on each of the five overseas 1991 census forms studied. The sequencing instruction used ranged from 'go to' and 'skip to' instructions for the US and Canadian censuses, the same but in bold and with arrows in the New Zealand Census, to yes/no questions directing respondents on to further questions in the Singapore and British censuses. Of these the instructions used in New Zealand seemed to stand out most clearly. Interestingly, the British census form used response categories where the boxes followed the text. Although there is no clear feedback as yet on the performance of sequencing instructions on overseas census forms, British studies show that boxes to the right of the response text are faster to fill in.

Method

Preliminary data were used despite the limitations imposed by the lack of editing because of their availability. The lack of editing also means that respondent error has not yet been repaired and so the type and incidence of errors can be studied. The problems of interpreting the results of analysis using relatively small samples experienced in testing were avoided by the use of very large samples. The first part of the analysis used NSW preliminary data while the more detailed labour force analysis which followed was based on preliminary data for Australia.

This response analysis puts each response in the context of the responses to surrounding questions. In most cases these are on the same topic and can be regarded as a distinct group of questions. The number of questions involved in one topic varies from two up to nine in the case of the labour force questions.

Instructions are usually located in the leading question of a group of questions to sequence out respondents for whom the questions are irrelevant. This is regarded as the screening question which acts to screen out some respondents from the following question (s) and encourage others to answer the subsequent question(s). For the purposes of analysis, this later question(s) is referred to as the target question because maximising the number of people correctly responding is the target of the sequencing instruction.

The presence of a sequencing instruction necessitates the careful analysis of responses to target questions. Depending on the response to the screening question, responses to a particular group of questions will be classified as correct or incorrect depending on whether the respondent has completed the questions in the manner required. This is not to judge the responses themselves, (which, as we do not have access to information to validate all responses, is impossible), but to identify cases where respondents behave contrary to the expectations of form designers, for example, by ignoring specific cues. Incorrect responses have been classified as the following types of error:
  • Commission error occurs when people answer a question when they are not required to. In these cases a sequencing instruction has effectively been ignored as it has failed to sequence out respondents from subsequent irrelevant questions. Commission error is deleted during main processing.
  • Omission error takes two forms: where there is no response to any questions in the particular group; and where only one or some of the questions were missed. In target question(s) omission error indicates the screening question has not encouraged respondents who should continue on to do so. Omission error in the screening question makes it difficult to determine whether a respondent should have answered the target question or not. Where all questions were missed, the records are treated as not applicable to the topic in question. The incidence of omission error is likely to be inflated by the lack of editing of preliminary data, but only some questions (where responses can be made in either the write-in area or a response box) may be significantly affected. Omission error is referred to as non-response in final data.
  • Multiple marks. Although the proportion of multiple marks is usually negligible, this type of error is easily identified and may indicate confusion on the part of respondents about specific questions and the associated response categories. Multiple marks can in some cases be taken as an instance of commission error.

The incidence of error was calculated as the number of records containing a particular type of error, as a proportion of all records. The rate of commission error was calculated as the proportion of respondents marking a response category containing a sequencing instruction who answered an target question contrary to the instruction, as a proportion of all respondents marking the category. Not stated rates were calculated both for screening and target questions by taking the number of not stated codes where a response was required as a proportion of the number of people for whom the question was relevant.

In the labour force error analysis in Section 5 these types of error were broken down even further across six variables into specific patterns of response. The most common patterns were identified and numbered according to their relative frequency. Note that the term labour force questions refers to all the questions from Q30 Full/part-time job to Q39 Travel to work. However the term self-coded labour force questions refers only to those questions captured on the PURF (ie. Q30, Q31, Q32, Q33, and Q39).

EXECUTIVE SUMMARY

This paper contains the results of investigations into the performance of sequencing instructions. Preliminary data was used due to its availability and timeliness, as well as its ability to demonstrate respondent error before any repair that takes place during later processing. This has enabled the release date of early 1993 forecast in the 1991 Census Data Quality Evaluation Plan (Minute 90/2086 of 13 December 1991) to be retained despite delays in the release of final data. The investigation is a response to concern about the effectiveness of sequencing instructions stemming from many sources, including the 1991 Census Respondent Observation Study.

There were two main methods of investigation. First, cross-classified tables of NSW data were used to study each instruction on the Household Form. The complexity of the self-coded labour force questions led to more detailed analysis of this sequence. Concerns were also raised about the possible effect of sequencing instructions at the beginning of the labour force questions on later questions such as occupation. Accordingly, the second method involved analysis of different patterns of response across the self-coded labour force questions using data for Australia. The type and incidence of the most common mistakes were identified. Both methods focussed on different types of error such as commission error, omission error (including instances where a response to either the screening question or the subsequent question was missing, as well as instances where all relevant questions were blank), and multiple marks. Investigation was able to yield more information than was available during the 1991 Census Testing Program.

In general sequencing instructions performed reasonably well. They are unlikely to have a significant effect on the quality of final data. Although information on the effectiveness of sequencing instructions in previous censuses is scarce, in the 1991 Census they appear to have replicated the performance recorded in tests. Specific concerns raised about the style of the instructions were not supported by the data. The tendency of some employed people to ignore the first sequencing instruction in the labour force questions, but follow the second (when they were meant to follow the first and thus skip the second) was unexpected although final data quality is not greatly affected.

The least effective instructions were those located in the language/English proficiency questions and the labour force questions. They were: 'Now go to 20'; 'Now go to 32'; and 'Now go to 40'. Questions associated with these instructions had the highest proportion of records containing error. Commission error ranged from 24 to 35 percent compared to a range of 1 to 18 percent for other groups of questions. The major factor which differentiates these questions from others is the existence of response categories which can still be logically answered by people who should have been sequenced out because the question is not relevant to them. For example, some employed people instructed not to answer Q31 Looked for work tended to answer the category in Q31 'No, did not look for work'. In this situation it seems clear that response categories tend to be stronger cues for response behaviour than sequencing instructions.

The most common mistake made in the self-coded labour force questions had no impact on final data but was associated with a sequencing instruction. The instruction 'Go to 32' indicated that for employed people Q31 Looked for work should be skipped, yet 9.18% of records for persons aged 15 years or more included an unnecessary response to Q31. The second most common mistake was to completely skip all the self-coded labour force questions (and probably the write-in labour force questions as well). This problem does not appear to be connected with sequencing instructions. However, only some of these not stated values can be repaired, with many remaining in final data. 6.35% of all PURF records for persons aged 15 years or more contained no self-coded labour force data.

Other mistakes which will be reflected in final data also take the form of omission errors and these are mostly cases of non-response to individual questions which are not expected to constitute a major problem. However there is concern about the potential impact of cases of non-response specifically to questions surrounding the occupation and industry questions on occupation and industry data. Analysis is planned to discover whether and to what extent final data will be affected. This study showed that potentially a maximum of about 2.41% of all records for people aged 15 years or more could be affected.

This study emphasises the importance of effective screening questions. Respondents appear to be selective about the questions they answer, often missing out groups of questions on one topic or only answering selected questions from a sequence. Non-response to screening questions makes analysis difficult because it is impossible to know whether the person should have answered a subsequent question or not. In most cases these records have been treated as if the subsequent question was irrelevant. The screening question can also fail to adequately screen out people, often because response categories in the following question tend to dictate response behaviour. In addition the screening question can sometimes screen out people who should have answered the subsequent question. For example, there is evidence that some people who did not have a job correctly answered Q30 Full/part-time job but failed to continue on to Q31 Looked for work.

There are many implications for form design. This investigation shows that there is clearly a need to explore the possibilities for improvement in certain aspects of particular sets of questions. Factors such as positioning on page; wording; response categories; keeping all questions on the same topic on the same page; keeping groups of questions aimed at different population subgroups away from each other; and the performance of screening questions require further investigation. While there may be some room for improving the sequencing instructions themselves, there is no evidence to show that this alone would improve the quality of the data.

Consideration should also be given to incorporating within processing systems methods to provide quick and accurate information on respondent error during testing and during census processing.


SEQUENCING INSTRUCTIONS INCLUDED IN THE 1991 CENSUS FORM


The table below presents sequencing instructions included on both the Household and Personal form for the 1991 Census. It indicates the location of each instruction according to the question and the response categories in which it is located; its wording; and the questions potentially affected. For all questions in Table 1 the instructions are located within the response area next to the relevant response category.

There were eight main sequencing instructions in Person questions on the Household Form, and these are discussed individually in this report. The instructions in Q22 15 years or more however are presented together. Note that only those response categories containing sequencing instructions in a particular question are listed.


Table 1: Sequencing instructions included in the 1991 Census form (Person questions)

LOCATION
    Instruction
    Target question(s)
    Question
    Response category
    11 Birthplace
    'Australia.'
    Now go to 13
    12 Year of arrival
    18 Language
    'No, speaks only English.'
    Now go to 20
    19 English proficiency
    20 Full/part time student
    'No.'
    Now go to 22
    21 Type of educational institution
    22 15 years or more
    'No, under 15 years.'
    No more questions for this person
    23 - 39 (All remaining person questions)
    'Yes, 15 years or more.'
    Continue to next question
    23 - 39
    24 Qualification
    'No.'
    Now go to 29
    25 - 28 (Qualifications)
    'Still at primary or secondary school.'
    Now go to 29
    25 - 28
    'Still studying for first qualification.'
    Now go to 29
    25 - 28
    30 Full/part time job
    'Yes, worked for payment or profit.'
    Now go to 32
    31 Looked for work
    'Yes, but absent on holidays, on sick leave, on strike or temporarily stood down.'
    Now go to 32
    31 Looked for work
    'Yes, unpaid work in a family business.'
    Now go to 32
    31 Looked for work
    31 Looked for work
    'No, did not look for work.'
    Now go to 40
    32 - 39 (Labour force status)
    'Yes, looked for full-time work.'
    Now go to 40
    32 - 39
    'Yes, looked for part-time work.'
    Now go to 40
    32 - 39

Table 2 illustrates the sequencing instructions located in Q42 Nature of occupancy on the Household form. The format of these instructions differs from other questions in that arrows are used to help direct respondents. The dwelling questions and response areas are all contained within one large boxed area and arrows direct respondents to other response areas within the box. These arrows begin at the outside of the smaller relevant response area. This contrasts with other questions on the form which are not grouped together as one question but listed individually down the page. In addition, written instructions are located with the arrow for the Monthly mortgage and Furnished/unfurnished questions. Other written instructions for the Mortgage question are located next to the relevant response category, in the same manner as other questions on the form.

The three instructions which direct respondents around within Q42 Nature of occupancy are discussed individually in this report. However the four instructions which direct respondents to Q43 Persons temporarily absent are treated as one type of instruction for the purposes of discussion.


Table 2: Sequencing instructions included in the 1991 Census form (Dwelling questions)

LOCATION
    Instruction
    Target question(s)
    Question
    Response category
    42 Nature of Occupancy:
    Rented dwelling
    'No'
    (to Dwelling owned question)
    Renting questions
    'Yes'
    (to Landlord question)
    Ownership questions
    Furnished/ unfurnished
    Outside response area
    Now go to 43
    43 Persons temporarily absent
    Dwelling owned
    'Yes, owned (paid off).'
    Now go to 43
    Housing loan repayment
    'Yes, being bought'
    (to Housing loan repayment question)
    43 Persons temporarily absent
    'No.'
    Now go to 43
    Housing loan repayment
    Housing loan repayment
    Outside response area
    Now go to 43
    43 Persons temporarily absent

SEQUENCING INSTRUCTIONS ERROR ANALYSIS (NSW DATA)

Commission error

1991 CENSUS PRELIMINARY COMMISSION ERROR RATES

Commission error rates are presented below for all questions affected by the presence of a sequencing instruction. They have been calculated as the proportion of persons marking a response category in which a sequencing instruction has been located who mark a subsequent question contrary to the instruction (where these persons were applicable to the original question). In the table below, for example, note that instructions located between Q24 and Q31 apply only to persons aged 15 years and over, and therefore the commission error rate represents persons aged over 15 years who answer a particular response category in a screening question and subsequently make a commission error, as a proportion of all persons marking the response category.


Table 3: Commission error rates by topic, Preliminary data, NSW

      Location of instruction (Question)
      Question incorrectly completed (target)
Commission
error rate
No.
    Description
%
11
    Birthplace
12
      Year of arrival
2.0
18
    Language
19
      English proficiency
24.2
20
    Student status
21
      Educational institution
1.3
22
    Aged 15 years or more
23
      Age left school
18.5
2224
      Qualification
18.8
2228
      Qualification, year obtained
1.0
2229
      Income
13.1
2230
      Full/part-time job
18.5
2231
      Looked for work
14.0
2232
      Job last week
2.4
2233
      Hours worked
8.8
2239
      Method of travel to work
7.7
24
    Qualification
28
      Qualification, year obtained
3.4
30
    Full/part-time job
31
      Looked for work
24.5
31
    Looked for work
32
      Job last week
25.7
3133
      Hours worked
34.2
3139
      Travel to work
34.5
42
    Nature of occupancy
42
      Landlord
1.2
4242
      Rent (weekly)
1.2
4242
      Furnished/ unfurnished
1.5
4242
      Dwelling owned
14.8
4242
      Housing loan repayments
2.3


1991 CENSUS PRELIMINARY COMMISSION ERROR RATES COMPARED WITH TEST DATA

Commission error rates from the 1991 Census, 1990 Dress Rehearsal and August 1989 Test were compared to find out whether the apparent differences in data from these sources could be explained. There are many factors which limit the comparability of the 1991 Census, Dress Rehearsal and August 1989 commission error:
  • Calculation methods. The most practical way to standardise methods was to adopt those used in the August 1989 Test, and ensure the Dress Rehearsal and Census data followed suit. Note that this means that figures quoted in this part of the report for 1991 data may differ to those in the rest of the report. See Appendix 1 for a discussion of methods for calculating commission error rates.
  • Sample sizes and geographic areas. 1991 Census Preliminary rates are calculated from New South Wales data, representing persons from over two million dwellings. The samples used for the Dress Rehearsal held in Victoria and the August 1989 Major Test held in Brisbane contained 20,000 and 10,000 dwellings respectively. In each case entire CDs were selected and the samples were not necessarily representative.
  • Processing status of the data. Although the 1991 Census data are preliminary data, both the Dress Rehearsal and August 1989 Test figures were calculated using IFURF data (i.e. before commission error had been resolved).
  • Development status of the processing system. The 1991 Census processing system was developed progressively over a long period of time.
  • Treatment of dummy forms. Records from dummy forms were excluded from the calculation of 1991 Census Preliminary rates. In the Dress Rehearsal, only persons from refusal and non-contact dummy forms were excluded from calculations as there was confusion over the classification of some mailback dummy forms. Persons included on dummy forms were not able to be excluded in the calculation of August 1989 rates.
  • Test question design. The August 1989 Test was the last in a series of tests where different question designs were trialed. The two forms employed different wording and instructions for particular questions. In addition the sequencing instructions on Form 9 were in bold type whereas those on Form 10 were in normal type.


Results

Table 4 shows commission error rates for each of the questions affected by sequencing instructions. Note that the 1991 Census commission error rates shown in Table 4 for some of these questions are lower than those quoted elsewhere in the report. Data for Type of educational institution, Age left school and Travel to work were not available for the August 1989 Test. Age left school commission error rates have been calculated in relation to Q4 Age (rather than Q22 Fifteen years or more) while for questions 32, 33 and 39 commission error rates have been calculated in relation to Q30 Full/part time job (rather than the Looked for work sequencing instruction) because these were the methods used in the August 1989 Test and Dress Rehearsal.

Table 4 shows that 1991 Census commission error rates are generally within 2% of the Dress Rehearsal and August 1989 Test results. It also shows the same questions (Looked for work, English proficiency, Age left school, Hours worked and Travel to work) exhibited the highest commission error rates in the 1991 Census, Dress Rehearsal and August 1989 Test.

Table 4: 1991 Census NSW Preliminary, 1990 Dress Rehearsal and August 1989 Test commission error rates


    Question
August 1989
Major Test
1990
1991 Census
Dress
Preliminary
F9
F10 a
Rehearsal
(NSW)
%
%
%
%
    Year of arrival
2.2
1.8
2.3
2.0
    English proficiency
30.3
25.9
25.9
24.2
    Type of educational institution
-
-
1.7
1.2
    Age left school
-
-
17.8
17.3 c
    Year of qualification
2.3 b
3.8
3.4
    Looked for work
23.2
19.9
23.9
24.5
    Job last week
5.5
5.0
4.9
4.9 c
    Hours worked
20.6
19.0
20.6
18.8 c
    Travel to work
-
-
17.0
17.3 c

a Two alternate forms were used, where sequencing instructions on one form were in bold.
b Average for forms 9 and 10
c These rates are lower than the equivalent rates listed in Table 3 due to the different method of calculation used for comparison purposes.


Summary

The August 1989 Test, 1990 Dress Rehearsal and 1991 Census Preliminary commission error rates seem to be consistent despite the comparability difficulties which exist.

Commission error rates obtained in census testing programs have been considered an overestimate of levels expected in a census because census conditions (such as compulsion) do not apply. This analysis indicates that this may not be as much of a problem as it was once thought as reasonable consistency was observed between 1991 Census Preliminary, Dress Rehearsal and August 1989 Test data.

Instructions by location

For commission error rates, see Table 3.


BIRTHPLACE

Location: Q11 response area

Wording: Now go to 13

Example:
11 Where was each person born?
( ) Australia. Now go to 13
( ) England
( ) Scotland
( ) Italy
( ) Greece
( ) New Zealand
( ) The Netherlands
( ) Other-please specify

12 When did the person first arrive in Australia?
( ) Before 1971
( ) 1971-1975
( ) 1976-1980
( ) 1981-1985
( ) 1988-1989
( ) 1990-1991


The commission error rate in Table 3 gives some indication of how well the sequencing instruction was followed. However, commission error represents only a small proportion of the total 13.1% of records containing error. For Q11 and Q12 the overall level of error is relatively low, but it appears to have been fairly common to miss one or both of the questions.

Table 5: Proportion of NSW records containing error - Q11 & Q12

Commission error
Omission error
Multiple marks
TOTAL
Both questions
One question
%
%
%
%
%
1.4
5.8
5.7
0.2
13.1

The low level of commission error may reflect the difficulty respondents born in Australia would have in answering Q12 Year of Arrival. The response categories cannot be understood without reference to the question, and the question is not sensible for those born in Australia.

Omission error is unlikely to have a major impact on final data. The question omitted was mostly the screening question, Q11. This may be explained by the problem of 'hidden responses' caused by respondents failing to mark the 'Other - please specify' box, and therefore will be largely resolved in main processing. Missing responses to Q12 only occurred in just 0.6% of records, representing 3.3% of those born overseas. Cases where both questions were missed are excluded from the calculation of the non-response rate for Q12.

LANGUAGE

Location: Q18 response area

Wording: Now go to 20

Example:
18 Does the person speak a language other than English at home?

( ) No, speaks only English. Now go to 20
( ) Yes, Italian
( ) Yes, Greek
( ) Yes, Cantonese
( ) Yes, Mandarin
( ) Yes, German
( ) Yes, Arabic
( ) Yes, other - please specify

19 How well does the person speak English?
( ) Very well
( ) Well
( ) Not well
( ) Not at all


The high commission error rate in Table 3 gives some indication of how poorly the sequencing instruction was followed. The extremely high level of commission error is indicative that the sequencing instruction (if read and comprehended) was over-ridden by something, probably the apparent relevance of the target question to all persons. The overall level of error was much higher than for other similar questions, and commission error represents almost two-thirds of the total 28.9% of all records containing error.

Table 6: Proportion of NSW records containing error - Q18 & Q19


Commission error
Omission error
Multiple marks
TOTAL
Both questions
One question
%
%
%
%
%
18.3
6.3
3.7
0.6
28.9

98% of records containing commission error recorded Q19 responses of 'very well' or 'well' (in fact, 'very well' was marked in 92% of cases), indicating that the respondent had read and understood the question without realising that a response was not required. Also, 'telling us' about their proficiency gives a positive feeling to the respondent.

It is mainly the respondents missing the screening question who fail to respond to the target question (thereby skipping both questions). This probably reflects a perception by those born in Australia that the questions are irrelevant. Missing responses to Q19 only occurred in just 0.34% of records, representing 2.5% of those speaking a language other than English at home. Cases where both questions were missed are excluded from the calculation of the non-response rate for Q19.

EDUCATIONAL INSTITUTION

Location: Q20 response area

Wording: Now go to 22

Example:
20 Is the person attending a school or any other educational institution?
Include external or correspondence students
For school students, mark second box.

( ) No. Now go to 22
( ) Yes, full-time student
( ) Yes, part-time student


21 Is the person attending a school or any other educational institution?
Include external or correspondence students
Examples of other higher educational institutions:
Institute of Technology, Institute of Advanced Education, Conservatorium of Music

( ) Pre-school Infants/Primary school
( ) Government
( ) Non-Government

Secondary school
( ) Government
( ) Non-Government

Tertiary institution
( ) Technical and Further Education (TAFE) College
( ) University, College of Advanced Education (CAE) or other higher educational institution
( ) Other institution please specify


The commission error rate in Table 3 gives some indication of how well the sequencing instruction was followed. However, commission error represents only a small proportion of the total 11.2% of records containing error. For Q20 and Q21 the overall level of error is relatively low, but the tendency of respondents to miss these questions altogether was a problem.

Table 7: Proportion of NSW records containing error - Q20 & Q21

Commission error
Omission error
Multiple marks
TOTAL
Both questions
One question
%
%
%
%
%
0.8
7.7
1.9
0.8
11.2


Q21 is difficult to answer for those who are not attending an educational institution. The sequencing instruction, which appears at the top of the page, may also be important in reducing commission error.

Records where both questions were missed are excluded from the calculation of the non-response rate for Q21. The only exceptions are those aged between 5 and 14 years who are given (imputed) codes of 'full-time student' in Q20 during main processing. The relative size of this problem probably reflects a perception by those who are not attending an educational institution (particularly those aged over 30) that the questions are irrelevant. Missing responses to Q21 only occurred in just 1.2% of records, representing 4.8% of those attending an educational institution.


15 YEARS OR MORE

Location: Q22 response area

Response: No, under 15 years
Wording: No more questions for this person

Response: Yes, 15 years or more
Wording: Continue to next question

Example:
22 Is the age given for the person 15 years or more?

( ) No, under 15 years. No more questions for this person
( ) Yes, 15 years or more. Continue to next question


The commission error rates in Table 3 associated with Q22 vary widely. The instructions in Q22 have the potential to affect all the remaining person questions on the census form, and this makes the task of calculating the overall level of error and its components impracticable. There is evidence that the level of commission error for any question is more influenced by its apparent relevance or lack of it according to the respondent's preceding responses.

Q22 replaced the banner instruction which appeared on the 1986 Census form instructing respondents to 'answer the remaining questions for each person aged 15 years or more'. Post-census evaluation of the 1986 Census found a high proportion of people failed to comply with the banner instruction. There are two separate instructions in the 1991 Census question - respondents answering 'No, under 15 years' are instructed to answer no further questions, and respondents aged over 15 years are instructed to answer the next question.

The rate of commission error for subsequent questions reflects the efficacy of the instruction next to the first response category in Q22, 'No more questions for this person'. The commission error rates in Table 8 can also be found in Table 3.

Table 8: Commission error rates for questions following Q22, NSW

    Target Question
Commission
error rate
    No.
    Description
%
    23
    Age left school
18.5
    24
    Qualification
18.8
    28
    Year of qualification
1.1
    29
    Income
13.1
    30
    Full/part-time job
18.5
    31
    Looked for work
14.0
    32
    Job last week
2.4
    33
    Hours worked
8.8
    39
    Travel to work
7.7

The table shows commission error was highest for Age left school, Qualification and Full/part time job. Commission error tended to occur in instances where a relevant response category was available:
  • Of those aged less than 15 years of age responding to Q23 Age left school, 80% marked 'Still at school'.
  • 96% of respondents aged under 15 years who answered Q24 Qualification marked 'No' or 'Still at primary or secondary school'.
  • Of those respondents aged under 15 years responding to Q30 Full/part-time job, 90% marked 'No, did not have job'.
  • For Q31 Looked for work almost 97% of records for persons aged under 15 years contained the response 'No, did not look for work'.
  • 81% of those aged under 15 years who answered Q33 Hours worked marked 'None'.


Conversely, commission error was low in questions where the response categories have little relevance for persons aged under 15 years. For example only 2.4% of records for persons aged under 15 years include a response for Q32 Job last week. Of the few who did answer, 58% responded 'wage or salary earner' and 31% 'unpaid helper'. This question has relevance for students with part-time jobs, or who work at home for pocket money, and 'telling us' about their 'productivity' gives a positive feeling to the respondent.

The second sequencing instruction in Q22, 'Continue to next question' for those marking the second category was reasonably well followed. This would be expected as it is a simple instruction and conforms to common form-filling practices. Only 3.4% marking the relevant category failed to continue on and answer Q23 Age Left School. Although the instruction explicitly refers only to the next question, it is understood that respondents are expected to follow the sequence of the questions through until the next sequencing instruction. However, the proportion of records where the respondent has failed to continue on appears to increase the further the question is away from Q22. Q24 Qualification was missed in 5.3% of cases while Q29 Income was missed in 8.3% of cases. However the rate of omission seems to improve later on with only 4.9% of records missing a response to Q30 Full/part-time job. Rates of omission error are difficult to calculate for questions following Q22 other than these, because of the difficulty in determining whether a person should have answered or not. Some not stated codes will be resolved in main processing.

Q22 could have confused some respondents since age has already been asked in Q4, and it may also have been regarded as irrelevant. The 1991 Census Respondent Observation Study suggests that part of the confusion this question causes arises from its location in the middle of the education questions. The study also found that when respondents were confused by it they did not answer but continued on anyway, often answering for children aged less than 15 years.

The main concern here is the possible effect on Q23 Age left school. Two in every three people who failed to answer Q22 also failed to answer Q23 Age left school. Three-quarters of these people were aged 15 years or more (according to Q4 Age), and therefore should have answered Q23 Age left school. Separate investigation of the changes in the pattern of response to Age left school in the 1991 Census so far indicate that form design may be an important factor.

Consistency checks against Q4 Age show that the quality of response to Q22 was generally good. Only 0.3% of respondents aged under 15 years in Q4 answered that they were aged over 15 in Q22, while 5.0% of respondents aged over 15 years according to Q4 answered 'No, under 15 years' to Q22. Note that data from Q4 rather than Q22 is used in processing to determine whether responses are required in questions directed at persons aged 15 years or more.

QUALIFICATION

Location: Q24 response area

Wording: Now go to 29

Example: Data available after OMR data capture are restricted to Q24 and Q28.

24 Has the person obtained a trade certificate or any other educational qualification since leaving school?

( ) No. Now go to 29
( ) Still at primary or secondary school. Now go to 29
( ) Still studying for first qualification. Now go to 29
( ) Yes, trade certificate or ticket
( ) Yes, other qualification

28 In which year did the person complete that qualification?

( ) Before 1971
( ) 1971-1975
( ) 1976-1980
( ) 1981-1985
( ) 1986-1987
( ) 1988-1989
( ) 1990-1991


The commission error rate in Table 3 gives some indication of how well the sequencing instruction was followed. However, commission error represents only a small proportion of the total 18.2% of records containing error. The tendency of respondents to miss these questions altogether (and perhaps miss Q25, Q26 and Q27 as well) was a problem.

Table 9: Proportion of NSW records containing error - Q24 & Q28

Commission error
Omission error
Multiple marks
TOTAL
Both questions
One question
%
%
%
%
1.8
10.8
3.7
1.9
18.2


Q28 is difficult to answer for those who are not attending an educational institution. Q28 was at the end of a series of questions on qualifications which required progressively more detailed answers, which should have caused respondents without qualifications to gradually drop out. Commission error may have been even lower if the two questions were on the same page and Q28 were not at the top of a new page. The 1991 Census Respondent Observation Study suggested that people making commission errors here were older or educated overseas.

Records where both questions were missed are excluded from the calculation of the non-response rate for Q28. The fact that both questions were skipped in over 10% of NSW records seems to reflect the lack of relevance the questions have for many respondents. Such cases accounted for 80% of the missing responses for Q28. Missing responses to Q28 only occurred in just 1.8% of records, representing 5.8% of those over 15 years of age with a qualification.

FULL/PART-TIME JOB

Location: Q30 response area

Wording: Now go to 32

Example:
30 Last week, did the person have a full-time or part-time job of any kind?

( ) Yes, worked for payment or profit. Now go to 32
( ) Yes, but absent on holidays, on sick leave, on strike or temporarily stood down. Now go to 32
( ) Yes, unpaid work in a family business. Now go to 32
( ) Yes, other unpaid work
( ) No, did not have job

31 Did the person actively look for work at any time in the last 4 weeks?
Actively looking for work means checking or being registered with the Commonwealth Employment Service; writing, telephoning or applying in person to an employer for work; or advertising for work.

( ) No, did not look for work. Now go to 40
( ) Yes, looked for full-time work. Now go to 40
( ) Yes, looked for part-time work. Now go to 40


The extremely high commission error rate in Table 3 gives some indication of how poorly the sequencing instruction was followed. The overall level of error is much higher in this part of the form, and commission error represents the largest proportion of the total 27.0% of records containing error. The high rate of commission error suggests the sequencing instruction in Q30 was over-ridden by the perceived relevance of Q31 to many respondents. However one third of the total records containing error for people aged 15 years or more were cases where neither question was answered.

Table 10: Proportion of NSW records for persons aged 15 years or more containing error - Q30 & Q31

Commission error
Omission error
Multiple marks
TOTAL
Both questions
One question
%
%
%
%
%
12.5
9.0
4.6
0.9
27.0


Only the groups of questions associated with the Language and Looked for work sequencing instructions have a higher level of overall error (around 30%), and commission error is high in all three cases. The response category 'No, did not look for work' in Q31 was chosen by 93% of those making a commission error.

Commission error was higher for people answering 'Yes, unpaid work in a family business' (32.5%) than for people answering 'Yes, worked for payment or profit' (24.2%) or 'Yes, but absent' (26.3%). It seems reasonable people answering 'Yes, unpaid work in a family business' would find Q31 more relevant and be more likely to answer it as they are not working for payment.

Records where both questions were missed are excluded from the calculation of the non-response rate for Q31. The relative size of this problem probably reflects a perception by those who are not in the labour force that the questions are irrelevant. Unfortunately this means that labour force status (even 'not in the labour force') cannot be derived.

Missing responses to Q31 only occurred in just 2.6% of records, representing 8.1% of those over 15 years of age without a job. This is relatively high in comparison with other target questions, and indicates that perhaps Q30 screens out some people without a job who should not be screened out. The proportion of people missing target questions which should have been answered after answering the screening question ranged from 2.5% for Q19 English proficiency to 5.8% for Q28 Year of qualification. The latter situation is slightly different because the screening question appeared much earlier in Q24.

LOOKED FOR WORK

Location: Q31 response area

Wording: Now go to 40

Example: Data for analysis are restricted to the self-coded labour force questions: Q31
Looked for work, Q32 Job last week, Q33 Hours Worked and Q39 Travel to work.

31 Did the person actively look for work at any time in the last 4 weeks?
Actively looking for work means checking or being registered with the Commonwealth Employment Service; writing, telephoning or applying in person to an employer for work; or advertising for work.

( ) No, did not look for work. Now go to 40
( ) Yes, looked for full-time work Now go to 40
( ) Yes, looked for part-time work. Now go to 40

The extremely high commission error rate in Table 3 gives some indication of how poorly the sequencing instruction was followed. The overall level of error is very high in this part of the form, and commission error represents about half of the total approximate 30% of records containing error. However the remaining half of records containing error for people aged 15 years or more were cases where at least one question was missed. A small component of the error relates to multiple marks (2%).

The instructions in Q31 have the potential to affect all the remaining person questions on the census form, and this makes the task of calculating exactly the overall level of error and its components impracticable. The indications are that much of the problem with commission error is due to the response behaviour of people with a job. There is evidence that the level of commission error for any question is more influenced by its apparent relevance or lack of it in the context of the respondent's preceding responses.

The large commission error rate for Q31 of 24.5% (see Section 4.2.6) indicates that many people with jobs are incorrectly answering Q31. This is potentially a major problem because Q31 directs all respondents away from most of the remaining person questions which only people with jobs are expected to answer. The extent to which people with a job correct themselves and answer Q32 (ignoring the sequencing instruction in Q31) will be crucial for data quality.

The commission error rate for Q32 represents the proportion of those answering Q31 who answer Q32. If every person with a job who made the mistake of answering Q31 corrected themselves by going straight to Q32, despite the instruction 'Now go to 40', the commission error rate for Q32 should be at least 25.1%. People without a job should find it difficult to answer Q32, so are less likely to contribute to commission error.

The actual commission error rate for Q32 was 25.7%. This seems to indicate that people with a job answering Q31 did correct themselves. However, further examination shows that not all people with a job corrected themselves, and that some people without a job incorrectly answered Q32. Only 21.9% of those answering Q31 were in fact people with a job who corrected themselves by answering Q32. 3.8% of those answering Q31 were people without a job who tried to answer Q32, and this proportion would probably have been much higher if Q32 had not been so irrelevant for this group of the population. If people with a job can be regarded as making only a 'technical' commission error by going from Q31 straight to Q32, then the effective commission error rate for Q32 is low at 3.8%, representing people without a job only.

The pattern of very high commission error established at Q31 and compounded in Q32 by people with a job continues in Q33 Hours worked and Q30 Travel to work. See Table 11 below. This is chiefly the result of the mistake made by people with a job at Q30, and serves to alert us to the behaviour of those with jobs as a potential source of error.

Table 11: Commission error rates for Q32, Q33 and Q39, Persons aged 15 years or more, NSW

    Target Question
Commission error rate
All persons over 15 yrs
    No.
    Description
%
    32
    Job last week
25.7
    33
    Hours worked
34.2
    39
    Travel to work
34.5

The increase in commission error in Q33 and Q39 compared to Q32 is due to a corresponding increase in the number of people without a job answering when not required to. The number of people answering Q31 but continuing, mainly because they had a job and the subsequent questions were relevant, remains stable throughout these three questions (about 21% of those who answered Q31). If people with a job can be regarded as making only a 'technical' commission error by failing to observe the sequencing instruction in Q31, then the effective commission error rate for Q33 is 13.8% and the commission error rate for Q39 is 13.3%, representing people without a job only.

The large number of questions to be skipped, spread over several pages, probably also contributes to the level of commission error corresponding to people without a job. Respondents either forgot the instruction or just looked for relevant questions, particularly if they were answering for both those with and without jobs. In Q32 Job last week the most common response from people without a job was 'unpaid helper' (71%). In Q33 Hours worked the most common response was 'None' (about 90%). Responses to Q39 Travel to work were less clear, with most answers from those without a job eventually being coded as 'could not be determined'. While people with jobs made 85% of the commission errors in Q32, this dropped to about 60% for Q33 and Q39 which allow respondents to report that they worked no hours last week and that they did not go to work.


Of those who answered Q31, 12.9% had a job but were influenced by the sequencing instruction 'Now go to 40' aimed at those without a job, and failed to answer Q32 as instructed in Q30. Q33 and Q39 also appear to have been skipped, indicating perhaps that these respondents followed the instruction to the letter and probably did not even read any of the relevant questions. Such respondents may have been completing the form in a hurry. It is unlikely that they were completing the form for others with differing employment arrangements, because doing so might have alerted them to the relevant nature of the questions they were missing. They followed the Q31 instruction after failing to observe the Q30 instruction, and thus skipped eight questions between Q31 and Q40 which should have been answered. This goes some way to explaining the higher rates of non-response now being recorded for later questions, such as occupation.

Another omission error was for only Q30 Full/part-time job to be missed, which could make the accurate derivation of labour force status difficult in some cases.

NATURE OF OCCUPANCY

The commission error rates in Table 3 give some indication of how well the sequencing arrows were followed. However, commission error was mainly confined to respondents in rented dwellings volunteering the information that they are not buying their dwelling. The tendency of respondents to miss out questions within Q42 was a problem. Problems associated with the lack of editing in this complex question have restricted analysis, and it has not been possible to comment on issues such as the performance of arrows versus phrases as sequencing instructions.

The task of calculating the overall level of error and its components was impracticable, given the number and complexity of the questions in Q42. As far as it is possible to determine, omission error is reasonably high, and this proved to be a major obstacle to analysis, particularly the high level of not stateds to the screening question. Further processing should reduce this, and in most cases respondents did go on to mark other response areas within Q42. Overall, it appears that respondents went to the right hand side of the page after failing to respond to the screening question in about 80% of cases. It seems likely that many of these respondents owned their dwelling and failed to respond because they thought the question was irrelevant.

The problems apparent during analysis are not restricted to preliminary data. The derivation of the variable Nature of occupancy is achieved in main processing via a computer program which takes into account responses to all the questions within Q42. However a significant component of records are classified as 'Inadequately described' and 'Other'. Although respondents attempted to respond, their responses are not meaningful, and to some extent this probably reflects the confusion among respondents found during analysis of the data.

There are three sequencing instructions which direct respondents within Q42. In addition there are four separate sequencing instructions which direct respondents to Q43. These instructions take the form of arrows and words and brief discussion follows. The question 'Is the dwelling rented by you or any usual member of this household?' and the associated response categories ('Yes/No) is regarded as the screening question for the other five questions within Q42.

Rented dwellings

Location: Q42 left hand column

Wording: Arrow
(For those answering 'Yes' to continue down and answer further questions about rented dwellings.)

Example:
42 Is this dwelling rented by you or any usual member of this household?

( ) No
( ) Yes

To whom is rent paid?
( ) ACT Housing Trust
( ) Other government agency
( ) Other

What is the weekly rent?
( ) Less than $48
( ) $48-$77
( ) $78-$107
( ) $107-$137
( ) $138-$167
( ) $168-$197
( ) $198-$227
( ) $228-$267
( ) $268-$307
( ) $308-$397
( ) $398-$447
( ) $448-$497
( ) More than $497

Is the dwelling rented furnished or unfurnished?
( ) Furnished
( ) Unfurnished

The commission error rate in Table 3 gives some indication of how well the sequencing instruction was followed. However commission error is less of a problem than the tendency of respondents to skip the screening question.

It is reasonably difficult for households who do not rent their accommodation to logically answer these questions. The commission error rates shown in Table 12 (overleaf) are the same as those associated with renting households in Table 3. Of the three renting questions, Un/furnished was slightly more likely to be completed by households which did not in fact appear to be renting. Commission error may however be underestimated because of the number of people who failed to answer the screening question.

Table 12: Commission error rates for questions about rented dwellings, NSW

    Question
Commission error rate
%
    Landlord
1.2
    Rent (weekly)
1.2
    Furnished/unfurnished
1.5


The proportion of stated responses recorded as multiple marks was extremely low for all three questions, and was most significant for the weekly rent question (3.1%). This relatively high proportion may indicate some confusion about amount of rent paid.

Of the 10.5% of dwelling records with no response given to the screening question, only about one fifth of these households appear to have been renting, according to other responses. Households that missed the screening question often tended to miss other questions as well. However, when they did answer one of the following questions the pattern of renting responses was similar to those households where the screening question was answered. For these households the Weekly rent question was most commonly answered, although about a third of all of their responses to this question were multiple marks.


Owned (or mortgaged) dwelling

Location: Q42 right hand column

Wording: Arrow
(For those answering 'No' to continue down and answer further questions about owned or mortgaged dwelling.)

Example:
42 Is this dwelling rented by you or any usual member of this household?

( ) No Is the dwelling owned (or being bought) by you or any usual member of this household?
( ) Yes

( ) Yes, owned (paid off). Now go to 43
( ) Yes, being bought
( ) No. Now go to 43


The high commission error rate in Table 3 gives some indication of how well the sequencing instruction was followed. Commission error represents the largest proportion of the total approximate 7.5% of records containing error. The high rate of commission error suggests the sequencing instruction was over-ridden by the perceived relevance of the Dwelling owned question to many renting households. Respondents who marked 'No' to the screening question but failed to respond to the next question about ownership represent about one third of all error. Multiple marks account for under 1%.

The complexity of the format of the six questions within Q42 may have been a contributing factor to the level of commission error. These households may have believed that they were required to follow both arrows at the first renting question rather than just the left hand side, especially as they could logically answer that they were renting and answer 'No' to Dwelling owned. A further 1% of households indicating they were renting (by marking 'Yes' to the screening question) answered the Housing loan repayment question.

In cases where much of Q42 is missed, the Dwelling owned question appears to often be the only part of Q42 which is answered. Over half of the records for which a response to the screening question is missing are cases where 'Yes, owned' was the only response box marked.

Monthly housing loan repayment

Location: Q42 right hand column

Wording: Arrow
(For those answering 'Yes, being bought' to continue down and answer Monthly housing loan repayment.)

Example:
42 Is the dwelling owned (or being bought) by you or any usual member of this household?

( ) Yes, owned (paid off). Now go to 43
( ) Yes, being bought
( ) No. Now go to 43

If being bought: What monthly payment or average monthly payment is being made on the loan(s) for this dwelling?

( ) Less than $201
( ) $201-$300
( ) $301-$400
( ) $401-$475
( ) $476-$550
( ) $551-$625
( ) $626-$700
( ) $701-$775
( ) $776-$850
( ) $851-$925
( ) $926-$1,000
( ) $1,001-$1,200
( ) $1,201-$1,400
( ) More than $1,400


The commission error rate in Table 3 gives some indication of how well the sequencing instruction was followed. This is reflected in the low level of overall error for records where the screening question was completed. About 1% of NSW households responded 'Yes, being bought' and missed the monthly mortgage payment question. Multiple marks account for about 1.5%.

In records where the screening question was missed about a quarter of all stated responses for the Monthly payment question were in fact multiple marks (compared to the usual level of about 1% multiple marks). Commission error was also slightly higher for these records with 5.4% answering monthly payment when not required.

Persons temporarily absent

Location: Q42 both columns

Wording: Now go to 43 or arrow and Now go to 43


There are two main formats used for the four sequencing instructions in Q42 which direct respondents to Q43 on the facing page:

1) a 'Now go to 43' instruction in the response area next to a particular response category; and

2) an arrow pointing right and a 'Now go to 43' instruction next to the response area box.

The two OMR response categories at the top of Q43 (illustrated below) provide the only indication available of response to Q43, and therefore of the performance of the instructions.

43 Are there any persons who usually live in this household who were absent on the night of Tuesday, 6 August 1991?

( ) No - please sign below
( ) Yes - please complete one separate column for each person absent


Only between one and two percent of households which should have followed a specific sequencing instruction in Q42, are actually recorded as answering Q43. Commission error is not a problem here because every household is expected to answer Q43.

Despite the major obstacles to analysis, it is clear that the instructions which direct respondents to Q43 performed poorly. This may reflect many factors, such as the lack of perceived relevance of Q43, the difficulties experienced by respondents in Q42, the plethora of similar instructions, and the unusual situation of directing respondents to the right rather than down. Direct information on the possible significance of any of these factors in the 1991 Census will be extremely difficult to obtain.

It is possible that some responses to Q43 are hidden, either because the 'Yes' response category was not marked but a response was written in or because both 'Yes' and 'No' responses were not read by the OMR reader. Note that at the Pre-capture phase of processing Q43 was clerically checked for written responses, and the category 'Yes' marked if not already completed in an effort to counter this. Of the relatively small number of households marking Q43, 60% of responses were 'Yes'.


Summary

Commission error

The commission error rates in Table 3 vary considerably. Where the commission error rate was high, the overall level of error also tended to be high, with commission error as the major component. The Language, Full/part-time job, and Looked for work questions were all associated with high commission error.

A high level of commission error is recorded when response categories with broad relevance appear in target questions which should only be answered by particular groups of the population. Only certain categories tend to be marked by those making a commission error, and these tend to be the categories with most relevance to respondents making this type of error. For example, English speakers showed that they spoke English 'Very well'; people with a job marked 'No, did not look for work'; and people without a job tended to show that they were unpaid helpers who worked no hours in the main job held last week. The final example occurs in Q42 Nature of occupancy where renters tended to answer 'No' to the question asking whether the dwelling is owned or being bought.

For people aged under 15 years of age, there is much scope for commission error. Once again the level of commission error varied, recording a high level in questions where a relevant response category occurred, and a low level in questions without relevant categories of response. Respondents tended to mark 'Still at school' for Q23 Age left school; 'No, did not have a job' for Q30 Full/part-time job; 'No, did not look for work' for Q31 Looked for work; and 'None' for Q33 Hours worked. They did not tend to attempt questions like Q28 Year of qualification and Q32 Job last week, where the response categories have much less relevance to people aged less than 15 years and are therefore difficult to answer.

The list of response categories in questions on the 1991 Census form appears to play an important role, sometimes encouraging responses and at others discouraging response, depending on the perceived relevance of the categories to respondents. This analysis shows that a problem arises with a self-enumeration form where perhaps half of the questions apply only to different subsets of the population. Sequencing instructions, rather than the questions and response categories themselves, are the main method of discouraging all but the target population. However the presence of a choice of response categories, as well as the fact that a respondent is likely to be exposed to most response categories when completing the form for a household, tends to affect the performance of sequencing instructions.

Commission error however is not a data quality problem in itself because this error can be repaired automatically during main processing. There may be residual error related to the original commission error, and this is harder to detect and repair (and may be related to factors other than commission error). Commission error is sometimes a 'common sense' reaction by respondents, based on the perceived relevance of the question and the response categories it contains. One example is the tendency of people with a job to make commission errors after mistakenly answering the Looked for work question and being directed straight to the dwelling questions. The difference here is that the questions ARE highly relevant, and should be answered by these people. In doing so these respondents are repairing their own error in answering Looked for work. There are clearly advantages in tailoring census questions to the perspective of the respondent.

Omission error

Errors of omission are the most common type of error made by respondents. The most common error of omission was where every question in a group was missed, indicating that many respondents considered these questions to be irrelevant. This was the single most important source of error for the Educational institution and Qualifications groups of questions, and is a major source of error in the self-coded labour force questions.

It is true that although preliminary data is the only source of information about commission error, omission error is difficult to measure using this source. Given that person records from dummy forms were excluded from this analysis, another possible interpretation of the data is that many responses have not yet been processed, thereby inflating the level of not stated codes in preliminary data. Previous research at the DPC showed that for screening questions such as Birthplace, with self-coded categories as well as a write-in area, this is a problem. Yet for both questions associated with Educational institution, and for Q30 Full/part-time job, a high level of 'true' non-response was found. Even in final data, cases where all the questions in a group were missed remain a problem, and it is necessary to manually exclude 'double not stateds' from the calculation of non-response rates for target questions because of the way in which these records distort non-response rates.

It seems clear that many people are choosing to skip whole groups of questions rather than one here and there. This decision may be made on the basis of perceived relevance, but the lack of response to the screening question makes confirmation of this difficult. The tendency for the respondents themselves to take over from the screening questions in directing which questions should be answered is disturbing. In failing to attract responses from the whole population or the population aged 15 years or more, the role of screening questions in helping to ensure that if required subsequent questions are answered, is reduced.

Most cases of omission error are not repaired during processing. While imputation is carried out for selected variables, this procedure was not adopted for the questions investigated here. The Labour force status derivation procedure is not greatly affected by omission error - as long as there are some responses after Q30 Full/part-time job, a code can be allocated. The extent to which omission error can be directly linked with the performance of sequencing instructions is difficult to identify, however omission error does represent a data quality problem which persists in final data.

Labour force data

The number of self-coded labour force questions and the scope for error make detailed analysis of response behaviour difficult using simple cross-classified tables. We have only been able to estimate the level and types of error occurring in this important part of the census form. For this reason another study was undertaken specifically to examine error in the self-coded labour force questions. The results are set out in Section 5.


LABOUR FORCE QUESTIONS ERROR ANALYSIS (AUSTRALIAN DATA)

Labour force data have tended to be the focus of data quality attention since the Census was held in August 1991. The recognised complexity and importance of the topics was reflected in the emphasis given to this part of the form in the 1991 Census testing program. This was reinforced by the findings of the 1991 Census Respondent Observation Study which indicated that sequencing around and within the labour force questions according to whether respondents have work, are looking for work or are not in the labour force performed poorly. However the observation study gave no conclusive evidence about the existence or extent of such problems. Testing had indicated that while this part of the form was not without problems on the whole it performed acceptably.

Early feedback from the processing centre indicated concern about the performance of sequencing instructions in the labour force questions. In addition, an unexpected change in the level of non-response was soon evident in data for later questions such as occupation and industry. While this is probably part of a broader shift in response patterns, the role of sequencing instructions requires some explanation.

It has become apparent that a wider perspective is required in order to gain an accurate and comprehensive picture of the reaction of respondents to the labour force questions. Respondents appear to treat the form as a series of groups or sequences of questions, and answer accordingly. While this report has already tackled the two instructions appearing in this part of the form, an additional analysis which approaches the labour force questions as a group and identifies patterns of response behaviour on this basis has been undertaken. This study first identifies the mistakes which tend to be made by respondents, and goes on to draw some conclusions about the nature and extent of the likely impact of these mistakes on the final data, and the importance of sequencing instructions in this context.


Method

IPURF data for Australia were obtained from the DPC in a format which allowed each labour force response in each record to be examined in the context of the available labour force responses for that record. This continues the approach of using preliminary data because of its availability and its ability to illustrate respondent error. Data for the whole of Australia were used due to the perceived importance of the topic and the need for the results to be representative.

However, preliminary data are not without disadvantages (these are documented elsewhere). Unfortunately data for occupation and industry were not able to be included in the analysis. Multiple marks in most cases were treated as stated responses and classified accordingly. See Appendix 1 for more information about the treatment of multiple marks in this study. Note that persons enumerated on dummy records were excluded.

Responses to the following self-coded labour force questions for each record were investigated:

22
        (Aged 15 years or more);
30
        (Full/part time job);
31
        (Looked for work);
32
        (Job last week);
33
        (Hours worked); and
39
        (Travel to work)

It was not practical to look at every one of the large number of permutations of responses, and accordingly only the most common were identified.

Results

The terms 'incorrect' and 'correct' are used in this report to distinguish between records with either non-response (where a response was required) or a respondent clearly failing to observe a sequencing instruction correctly, and records where these problems do not appear. Records were so designated by first identifying all the possible logical or correct response permutations for the self-coded labour force questions. See Appendix 2 for a list of correct response permutations. Six were identified and 65.17% of all records were able to be classified as 'correct'. Then, the major incorrect or illogical response patterns (and variations) were identified. Each type of permutation was given a distinguishing number.

The 17 most common incorrect response permutations are listed in order of frequency in Table 8. Further effort identifying other specific incorrect permutations was not considered worthwhile, given the fact that each permutation must represent less than 0.34% of all records, and that there must be 40 or more such permutations. Records accounted for by all of these minor incorrect permutations total 11.35% of all records. The total number of incorrect records was derived from the difference between the total number of records, and the number of correct and identified incorrect records.

Table 8 shows that almost 5.8 million records representing 34.83% of the total population of Australia contained some type of error in the labour force sequence of questions prior to main processing. It is difficult to identify particular causes for the relatively large number and range of errors. See Appendix 3 for information on the language and educational characteristics of people making the common labour force errors. The incorrect records are slightly more likely than the population as a whole to contain people who are aged over 60 years, who speak a language other than English at home, and who either have no qualification or a trade qualification. A discussion of the most common mistakes related to each of the sequencing instructions follows. Note that analysis for the most part pertains only to records classified as Types 1 to 17 in Table 8.

Table 13: Incorrect response permutations, Labour force questions, Australia

TypeQ22Q30Q31Q32Q33Q39
Number
Proportion
15 yearsFull/partLookedJob lastHoursTravel
of records
of all records
or moretime jobfor workweekworkedto work
%
12JSSSS
1,117,901
6.74
2*****
585,301
3.53
31DS@@@
258,453
1.56
42DS@SS
251,645
1.52
52DS@S@
236,283
1.42
62*****
235,209
1.42
72DS@@S
224,469
1.35
82J@S*S
169,946
1.02
92D*@@@
151,269
0.91
102JS***
141,866
0.86
112DSSSS
95,794
0.58
122J@SS*
83,924
0.51
13*D*@@@
81,204
0.49
14*JSSSS
68,999
0.42
152J@*SS
68,758
0.41
161D@@@@
68,243
0.41
172**SSS
56,485
0.34
Subtotal
3,895,749
23.48
    Other incorrect records
1,882,385
11.35
    Total incorrect records
5,778,134
34.83
    Total correct records
10,810,100
65.17
Grand total
16,588,234
100.00

Notes

*
        Not stated where a response was required.
@
        Not applicable - not stated where a response was not required.
1
        A response of less than 15 years to Q22.
2
        A response of 15 years or more to Q22.
J
        A response to category 1, 2 or 3 of Q30, suggesting the respondent had a job.
D
        A response to category 4 or 5 of Q30, suggesting the respondent did not have a job.
S
        A stated response to Q31, Q32, Q33 or Q39.

RESPONSE BEHAVIOUR AS A FUNCTION OF AGE (ACCORDING TO Q22)

22 Is the age given for the person 15 years or more?

( ) No, under 15 years. No more questions for this person
( ) Yes, 15 years or more. Continue to next question


This question has importance in the context of the labour force questions chiefly because of the potential effect of the sequencing instructions included in Q22 on labour force data. It also provides a useful indicator of broad age group in the context of each response permutation.

Types 3 and 16 (about 2% of all records) represent commission error by people aged 15 years or less. People making a commission error tended to answer only those subsequent labour force questions they might have perceived as relevant, namely Q30 and/or Q31.

Consistency between Q4 (Age) and Q22 is good. Inconsistencies appear to mostly emanate from respondents who later indicate they do not have a job. Those aged under 15 years may have been merely marking any logical response category they could find, and will be edited out. The majority of inconsistent records represent respondents aged over 15 years, who did not have a job and who will therefore be coded as unemployed or not in the labour force, depending on other information they give.

Non-response to this question was not a factor in the classification of response permutations as incorrect. Types 2, 13 and 14 do include not stated values for Q22 but also include other mistakes. Type 2 is by far the most common of the three, where responses to all the self-coded labour force questions are missing. Just over a third of these Type 2 records represent people aged 60 years or more, suggesting the labour force questions were not of relevance. Interestingly, about 70% of respondents who skipped all the self-coded labour force questions also failed to answer other questions we looked at in this context, such as Q24 Qualification or Q18 Language.

Just under 9% (907,457) of respondents who indicated they were aged 15 years or more to Q22 failed to answer one or more of the self-coded labour force questions they should have answered (Types 6, 8, 9, 10, 12, 15 and 17). Again, nearly 30% of respondents skipping questions in this way were people aged 60 years or more, for whom the labour force questions may well have appeared irrelevant.

RESPONSE BEHAVIOUR AS A FUNCTION OF Q30 FULL/PART-TIME JOB

30 Last week, did the person have a full-time or part-time job of any kind?

( ) Yes, worked for payment or profit. Now go to 32
( ) Yes, but absent on holidays, on sick leave, on strike or temporarily stood down. Now go to 32
( ) Yes, unpaid work in a family business. Now go to 32
( ) Yes, other unpaid work
( ) No, did not have job


The census form design presumes that respondents will answer Q30 accurately, and then carefully follow the sequence of questions as directed according to their Q30 response. This analysis shows that the quality of responses to Q30 is quite high, in the light of other responses to labour force questions. However, subsequent responses, while not necessarily contradicting the Q30 response, do not always follow the expected pattern. These are discussed here in terms of response to Q30.

Non-respondents to Q30 (Types 2, 6, and 17) present a problem. It can be difficult to determine from later responses whether a person had a job or not. Type 17 represents records where some questions, such as Job last week, Hours worked, and Travel to work, were answered. These people (0.34% of all persons) may have been employed although it is possible that they would be people to whom the category 'A helper not receiving wages or salary' in the Job last week question may have been relevant. The great majority of non-respondents did not answer any of the remaining self-coded labour force questions (Types 2 and 6, representing 4.95% of all records).


Persons with a job

There were two main mistakes associated with those respondents marking the first three categories in the Full/part time job question (indicating they were employed):

a) About 80% of those with a job answered Q31 Looked for work when they were not meant to. This pattern of response can be seen in Types 1, 10 and 14, representing 8.02% of all records (1,330,376 persons). People who indicated they were from a non-English speaking background were slightly more likely to make this mistake than the rest of the population. Note that Type 10 (representing 0.86% of all records) includes those who subsequently missed all the remaining self-coded labour force questions blank (and possibly also write-in questions such as occupation and industry) in accordance with the sequencing instruction in Q31.

b) About 20% of those with a job missed one of the remaining labour force questions which should have been answered. These respondents appear in types 8, 12 and 15, representing 1.94% of all records. This suggests 'picking out' rather than problems specifically with sequencing instructions. Perhaps they were confused by some questions, or found them difficult or intrusive. The Hours worked question was more likely to be missed than Job last week or Travel to work. This is illustrated in the contrast between Type 8 (1.02% of all records) and Types 12 and 15 (0.51% and 0.41% respectively). It is possible that in up to 1.02% of cases respondents actually missed the whole page on which Hours worked appears at the top, but this issue cannot be resolved using preliminary data. The possibility will be explored in other planned analysis.

The Type 10 phenomenon is very difficult to explain. One possible explanation lies in the fact that for the first instruction, all elements (instruction, question to be skipped, and question to go to) are not only close together, but on the same page. This is not true for the second instruction. This proximity may encourage respondents to look at all three questions involved, and in this case it is very easy for them to discover a plausible response category and respond accordingly.


Persons without a job

Some of these respondents indicating they did not have a job said they were aged under 15 years of age in Q22 (Types 3 and 16) and, given the high level of consistency between Q4 Age and Q22, are excluded from analysis. There were two main mistakes associated with the remaining respondents who marked categories 4 and 5 in the Full/part time job question.

a) 78% of those without a job answered subsequent labour force questions they were not meant to (Types 4, 5, 7 and 11), particularly the Hours worked (Type 5) and Travel to work (Type 7) questions. They represent 4.87% of all records. Such people are slightly more likely to be from a non-English speaking background, have a trade qualification, and be aged between 15 and 59 years, suggesting they could have been in the labour force.

b) 22% of those without a job failed to answer the Looked for work question when they should have (Types 9 and 13). This mistake was slightly more common amongst those aged 60 years or more than those aged 15-59 years. If respondents were influenced at all by the proximity of a sequencing instruction, we would expect that Q31 was missed because they tried to 'Go to 32'. However respondents making this mistake (representing 1.4% of all records) ceased to answer all self-coded labour force questions at Full/part time job. Type 11 is the only response situation we have identified where persons indicating they are either not in the labour force or unemployed actually do mark Q32 Job last week. However, in this example all self-coded labour force questions have been marked, including Q31 Looked for work, and this does not conform to the sequencing instruction in Q30.

Effect on final data

Many edits have been specified to counteract respondent error, especially for important data like labour force. The extent to which editing can repair error and what is likely to remain in final data is discussed below.

Q30 Full/part-time job is recognised as a crucial part of the labour force questions, and form design and processing rely on the assumption that this question is answered accurately. None of the edits specified repair errors made here. The data show that the question does act as an effective screening question. Commission error in subsequent questions occurs only in cases where there is a logical response category available even if the question itself is irrelevant (such as 'did not look for work' for an employed person). The rate of response is also good, and most non-respondents appear to be elderly although some may have been unpaid helpers. We know that in about 61,000 records multiple marks have been made to this question, indicating a degree of confusion or difficulty for some respondents. (See Appendix 2. These records are included in Table 8 in the 'Other incorrect responses' category. The great majority of these records will be coded as employed, given the practice for coders to take the first response in such situations.)

The major edit relates to people aged less than 15 years. All persons aged less than 15 years, (according to Q4), are classified as 'not applicable' to the labour force questions. In this context, response to Q22 becomes irrelevant because Q4 rather than Q22 is used to determine the applicable population. The result of the edit is that the applicable population, or the number of records to be allocated a labour force status (LFS), is 12,914,414, representing 77.85% of the total population. In the context of this study, this means that 39.34% of the applicable population (5,081,044 persons) made some kind of error in the labour force questions. We are interested in the proportion of this error which can be repaired as well as the proportion that might be associated directly with sequencing instructions.

For persons aged 15 years or more labour force status is derived using responses to questions 30, 31, 32, 33 and 39. A computer program allocates a code between 1 and 7, or 'not stated' depending on the response permutation provided to these questions. Codes 1 to 4 represent different types of employed persons, 5 and 6 unemployed persons and 7 people not in the labour force. In most cases only responses to questions 30, 31 and 32 are necessary for the derivation of labour force status.

Table 9 (overleaf) excludes persons aged less than 15 years (based on Q4). The response types associated with this age group are Types 3 and 16. While all records in Type 16 represented people aged less than 15 years, only about half of the records in Type 3 represented this age group. This is reflected in Table 9 which excludes Type 16 and where Type 3 is significantly reduced in size. The exclusion of Q22 from the listed permutations means that several can be combined. Therefore only 13 categories are listed. The order has changed, but the same nomenclature has been retained. The table gives an idea of the likely labour force status to be allocated to each response string during main processing, and the number of records involved. Only a predicted labour force status can be provided because the LFS derivation model in some instances distinguishes between different responses to each question whereas the data used for our analysis only indicates whether a response to each question is stated or not.

For most incorrect response types a logical labour force status (LFS) is likely to be derived. Respondents from the largest error group (Types 1 and 14 combined) are expected to be given an employed LFS code during final processing. Respondents from Types 4, 6, and 7 are expected to be given a LFS code of either unemployed or not in the labour force despite not observing the Q31 sequencing instruction. There will be no impact on final data although the answers provided to Q32, Q33 and Q39 will not be coded because these questions are only applicable to employed persons.

Table 14: Predicted labour force status allocated to incorrect response permutations, Labour force questions, Australia

    Type
Q30Q31Q32Q33Q39
    Predicted
Number of
Proportion of
Full/partLookedJob lastHoursTravel
    Labour force
persons
records for
time jobfor workweekworkedto work
    status
15 years
persons aged
or more
15 years or more
(Australia)
(Australia)
%
    1+14
JSSSS
    Employed
1,185,132
9.18
    2+6
*****
    Not stated 1
820,491
6.35
    4
DS@SS
    Unemployed 2 or NILF
250,268
1.94
    5
DS@S@
    Unemployed 2 or NILF
234,907
1.82
    9+13
D@@@
    Unemployed or NILF
232,473
1.80
    7
DS@@S
    Unemployed 2 or NILF
223,253
1.73
    8
J@S*S
    Employed
169,946
1.31
    10
JS***
    Employed
141,866
1.10
    11
DSSSS
    Unemployed 2 or NILF
95,387
0.74
    12
J@SS*
    Employed
83,924
0.65
    15
J@*SS
    Employed
68,758
0.53
    17
**SSS
    Employed 3
56,485
0.44
    3
DS@@
    Unemployed 2 or NILF
28,610
0.22
    Subtotal
3,591,500
27.81
    Other incorrect records
1,489,544
11.53
    Total incorrect records
5,081,044
39.34
    Total correct records
7,833,370
60.66
    Grand Total (persons aged 15 years or more)
12,914,414
100.00


Notes

*
          Not stated where a response was required.
@
          Not applicable - not stated where a response was not required.
J
          A response to category 1, 2 or 3 in Q30 (had a job).
D
          A response to category 4 or 5 in Q30 (did not have a job).
S
          A stated response to Q31, Q32, Q33 or Q39.
NILF
          Not in the labour force.
1
          Full-time students aged 15-24 years, males aged 65 years or more, and females aged 60 years or more are allocated to not in the labour force.
2
          Responses of 'No, did not look for work' in Q31 are allocated a labour force status of not in the labour force. Responses of 'Yes, looked full-time' or 'Yes, looked part-time' in Q31 are allocated a labour force status of unemployed.
3
          A small proportion of respondents indicating they were a 'helper' in Q32 are allocated to unemployed or not in the labour force rather than employed.

However there are some instances where it will not be possible to repair the data. The problems are typically instances of non-response, where responses to relevant questions should have been supplied, but are missing. Table 10 illustrates four main types of error which will remain in final data.

Not all of the errors are directly related to sequencing instructions. Only response Type 10, where an instruction was mistakenly followed, and Type 15, where an instruction was not followed (Q32 was not answered despite the instruction 'Go to 32') can be directly linked to sequencing instructions. Together, these categories represent 1.63% of all records for persons aged 15 years or more.


Table 15: Incorrect response permutations with possible impact on final labour force data

    Description
Type(s)
Number of records
Proportion of records
for persons aged
for persons aged
15 years or more
15 years or more
(Australia)
(Australia)
%
    1.
    Non-response to all self-coded labour force questions
2, 6
820,491
6.35
    2.
    Possible non-response to write-in labour force questions, and non-response to selected self-coded labour force questions
8, 10
311,812
2.41
    3.
    Non-response to Q31
    Looked for work
9, 13
232,473
1.80
    4.
    Non-response to selected self-coded labour force questions
12, 15
152,682
1.18
    Total
1,517,655
11.74


The labour force derivation model is able to set people with particular characteristics (for example, full time students aged 15-24 years) to 'not in the labour force' rather than 'not stated'. By doing this as well as correcting for possible non-reads the number of 'not stateds' is reduced. Non-response to labour force status in final data for Tasmania and the ACT is just under 2%, and in both cases is similar to the level recorded in the 1986 Census.

Despite gaps in the data, Types 8, 10, 12 and 15 are also expected to be allocated an employed labour force status code. However the gaps will become non-response for individual variables in final data. Further, additional gaps (Types 8 and 10) may become apparent in later processing when write-in questions such as occupation and industry are coded. These gaps are of concern because they reflect problems with the sequencing of respondents through the labour force questions. Some of the gaps may represent up to eight missing questions. For example in Type 8, respondents may have missed the page of the form which includes the Hours worked, occupation and industry questions. In Type 10, may have incorrectly skipped Q32 through to Q40, thereby also missing occupation and industry.

Non-response to Q31 (Types 9 and 13) by respondents who did not have a job makes the allocation of a correct LFS code very difficult. There is no way of deriving from responses to other questions whether the person concerned is unemployed (LFS codes 5 and 6) or not in the labour force (LFS code 7). In practice an LFS code of 5, 6 or 7 is imputed using information on the distribution of such responses from the 1986 Census. About 4% of all records likely to be given a code of 5, 6 or 7 based on preliminary data will be affected in this way. Using 1991 final data for Tasmania as an example, only about 8% of persons aged 15 years or more are unemployed. There could be a significant impact on data for the unemployed because the employment situation is different in 1991 to that in 1986.

Although it is possible that people aged over 60 years tended not to answer Q31 because it appeared irrelevant (they would be regarded as not being in the labour force) it is equally possible that long-term unemployed people regarded Q31 as very sensitive and therefore tended to skip it. A comparison of data from the labour force survey component of the monthly population survey with census data shows that although the proportion of unemployed people rose sharply between July 1986 and August 1991 in both collections, for Tasmania the increase in the proportion of unemployed was higher in the labour force survey (43.71%) than in the census (36.56%). This may indicate that the census slightly underestimates the proportion of unemployed people, although overall the proportion of unemployed people is higher in the census than in the labour force survey.


Summary

While this study is not able to accurately quantify the impact of errors on final labour force data, it has been able to identify the most significant errors and to estimate the upper limits of their impact. The errors of most concern are those represented by Types 8 and 10, particularly for their potential effect on occupation and industry. Research will continue using IFURF data for NSW, and any further information about the significance of these types of errors will be circulated as soon as it becomes available.

It seems clear that sequencing instructions are not the major cause of problems with the labour force questions. Type 10 is the one significant type of error that is clearly associated with sequencing instructions. Response behaviour appears to be shaped by factors other than sequencing instructions. These factors include response categories having relevance for a broader range of respondents than those intended to answer the question, overall sequence of questions, and positioning of page breaks.

In general, omission error or non-response (which is usually not associated with sequencing instructions) is the most significant problem, because it cannot be easily repaired. At least 60% of records first identified as incorrect in this study will be repaired in main processing, the remainder representing for the most part cases of non-response. Although any OMR non-reads will be repaired, up to 18% of records for people aged 15 years or more may have data missing for at least one labour force variable. The omissions will be spread over all the labour force variables.

In most cases of omission error the LFS derivation model can still derive a labour force status code. Analysis of data for Tasmania shows that despite the differences in the method of LFS derivation, the proportion of records for which an LFS code is able to be derived was similar in both the 1986 and 1991 Censuses (about 98%). The fact that for Tasmania, the increase in the increase in the proportion of unemployed between 1986 and 1991 was higher in the monthly labour force survey than the census may indicate a possible problem with the allocation of labour force 'unemployed' and 'not in the labour force' codes (discussed above). However it may also reflect the contrast between the fully computerised 1991 LFS derivation model and the 1986 model which relied on clerical resolution of difficult cases (with reference to all of the labour force questions).


CONCLUSION

Sequencing instructions in the 1991 Census at first glance appeared to have performed relatively poorly in comparison to testing. However once the differences in the method of calculating commission error have been resolved, this study shows that the performance in testing has been replicated in the census. Overall, sequencing instructions performed reasonably well, with minimal effect on data quality. About 80 to 90% of respondents seem to follow most instructions.

It appears that a revision of ideas about the performance of sequencing instructions in the 1991 Census is required. There is no evidence to show, as previously thought, that the style of instruction was poor; that instructions were ambiguous; that the Q22 instructions for those under and over 15 years were major problems; or that there was a difference between the performance of instructions located horizontally beside an answer box and those located on the line below. The extent to which a sequencing instruction is 'eye-catching' is alone probably not the crucial factor determining respondent behaviour.

There appear to be many cues apart from sequencing instructions which guide respondents through the form. This study suggests that the focus of form design should be on factors other than the style of the instruction itself. The 1991 Census form was the first to include so many self-coded response categories, and it is these which appear to have the strongest influence on response behaviour, especially when instructions are missed or forgotten. Factors which could be tested include the performance of the question in which the question is located (screening questions). The positioning of screening questions on the page and the overall ordering of questions could also be important (particularly those aimed at people aged 15 years or more). Perhaps the disadvantages of avoiding categories with very broad relevance could be weighed against possible improvements in sequencing.

The idea that response boxes to the right of the text will speed up response, if true, could be a disadvantage. It may encourage respondents to depend even more on the response categories than already seems to be the situation, with less care being exercised and an increased prominence of the categories. In this case response categories might possibly supersede sequencing instructions as a means of sequencing respondents through the form. The tendency of respondents to forget to mark the 'Other - please specify category' before writing a response might also be encouraged, which would affect preliminary data quality.

The two main purposes of sequencing instructions are to reduce the burden of filling in the form and to encourage responses to questions for which an answer is required. In the past the emphasis has tended to rest on the former. Commission error is one of the most common and easily identified mistakes made by respondents, giving rise to concern both in the field and during processing. However in almost all cases commission error is easily repaired and therefore has very little impact on data quality. The latter is also very important. Errors of omission are not easily repaired, and repair is only attempted in the form of imputation on rare occasions.

The results of this study represent a yardstick against which sequencing instructions can be measured in the future. Although respondent burden remains a concern, the focus of form design should be on preventing omission error (and improving sequencing), in preference to preventing commission error (and improving sequencing instructions). Of course, the latter may also follow from the former.




APPENDIXES


APPENDIX 1

CALCULATING COMMISSION ERROR

The complexity of the labour force questions gives rise to many possible ways of manipulating the data. We used NSW 1991 Census Preliminary data to demonstrate three different methods of calculating commission error to produce three different commission error rates with reference to self-coded labour force questions. The exercise is considered to be useful in terms of making clear the meaning of each method and the underlying commission error, and the differences between them. In addition to clarifying some of the confusion in this area, it may also help in identifying which methods are appropriate in different situations.

The methods are explained below, and examples using NSW 1991 Preliminary data can be found in the attached table.

Method A
This is the method used during the 1991 Census Testing Program which was also adopted for the 1990 Dress Rehearsal. This approach assumes that responses to Q30 are correct and that all subsequent responses follow the logic of the response to Q30. However our analysis of Australian data elsewhere shows that this is often not the case. In addition, this approach tends to focus on persons marking categories 4 and 5 in Q30 (those not in the labour force) whereas many of the problems identified in the labour force questions in the census emanate from people in the labour force.

The attached table shows the commission error rates derived using this method, which appear to seriously underestimate the problem for questions 32, 33 and 39. Note that for Q31 the denominator represents people who marked categories 1, 2 or 3 in Q30 (and who were instructed to skip Q31), which is the same approach as Method C. However, the denominator for the remaining three questions represents people who marked categories 4 or 5 in Q30. Therefore these rates measure only the extent to which those (probably) not in the labour force made commission errors without giving an indication about what people in the labour force are up to. The latter can be found incorrectly answering Q31 as well as skipping Q31. The numerator represents people who have responded when not required to, as determined by their responses to Q30 indicating whether they are in the labour force or not.

Method B
Although this method has never officially been quoted as commission error, it obviously measures commission error, and can be a useful source of information. It fulfils a descriptive function by measuring the impact of commission error on data for a question affected by sequencing instructions. For example, this tells us that about a quarter of responses to Q31 are actually mistakes and should be edited out in main processing (whether they actually are or not is another issue).

While the numerator represents the number of records where an answer was given which was not required (determined by which categories were marked and what the accompanying instruction was), the denominator changes for each question, and in each case represents:

Q31
        All persons responding to Q31
Q32
        All persons responding to Q32
Q33
        All persons responding to Q33
Q39
        All persons responding to Q39

So, this method measures commission error as a function of the question affected by a sequencing instruction.

The main problem with this approach is that it is also reasonably narrow, reflecting the impact on each question in isolation from the others, and giving no clear idea of the source of the problem. Another problem is the scope it provides for confusion with other measures of commission error without being a tool robust enough to stand on its own. The table shows that this method may also tend to slightly underestimate the extent of overall commission error, chiefly because it measures a particular aspect of commission error.


Method C
This method focuses on each instruction and relies on the assumption that people marking response categories incorporating a sequencing instruction in the response area will read the instruction. The resulting rates reflect the proportion of people who read an instruction but failed to follow it. Therefore the denominator represents the number of persons marking a category which includes a specific instruction, while the numerator is the same as for Method B.

The main disadvantage of this method is the fact that the inevitable presence of not stated codes for screening questions means that there are always some records where we do not know whether a person has read the instruction. Without unnecessarily complicating the picture, we also know that the actual response categories may be a stronger cue prompting response behaviour than instructions. Another problem is that in some cases people may read an instruction by mistake and then follow it when not required to.

The main advantages are that this method represents a relatively simple concept, it links respondent behaviour directly with a specific instruction, and it allows a consistent approach to be maintained in measuring the performance of all sequencing instructions on the form. It could be argued that it does overestimate commission error because it measures the behaviour both of those who should follow the instruction as well as those who should not. However, this can also be seen as an advantage because it does give a wider view of respondent behaviour than the other methods. Once we know what is happening in this sequence of questions, then we can start to identify the causes of the behaviour. For example, the commission error rates of about 35% for questions 33 and 39 alert us to the fact that there is a major problem. However, it is relatively easy to determine how much of this is due to people who should not have answered Q31. Note that a high commission error rate is required given the proportion of people answering Q31 when not required - without this, much data for people in the labour force would have been lost.


While all the methods outlined here have their uses, there is a very real danger both for those doing the analysis as well as those using the results of misinterpretation. For this reason, it would be realistic to only use one as the main measure of commission error, and to use the term 'commission error' to refer to the results of that method. The others could be used in conjunction to shed light on the main measure. It is vital however, that everyone concerned understands what each means and how to use them. The method adopted in Working Paper 93/1 uses this approach and relies on Method C.


APPENDIX 2

LABOUR FORCE QUESTIONS CHARACTERISTICS OF RESPONDENTS MAKING ERRORS

To better understand the characteristics of those making errors to the labour force questions, responses to Q18 Language and Q24 Qualifications were identified for all records. These responses have been cross-classified against the various response permutations provided by respondents to the self-coded labour force questions. One point to note is that people who failed to respond to the labour force questions often also tended to miss other questions, such as Language and Qualification.

Table 17 shows the number and proportion of persons making common errors in the labour force questions (as identified in the Labour Force Questions Error Analysis) from English speaking backgrounds and non-English speaking backgrounds.

Table 17: Incorrect labour force response permutations by (a) language response, Australia

    Type
Speaks English only
Speaks a language other than
English
Not stated
Total *
    (a)
No.
%
No.
%
No.
%
No.
%
    1
926,833
82.91
143,677
12.85
39,381
3.52
1,117,901
100.00
    2
137,268
23.45
18,698
3.19
428,038
73.13
585,301
100.00
    3
207,261
80.19
34,870
13.49
15,151
5.86
258,453
100.00
    4
190,812
75.83
44,138
17.54
14,071
5.59
251,645
100.00
    5
182,632
77.29
38,695
16.38
13,139
5.56
236,283
100.00
    6
168,349
71.57
31,778
13.51
33,567
14.27
235,209
100.00
    7
178,862
79.68
32,521
14.49
11,090
4.94
224,469
100.00
    8
138,570
81.54
18,566
10.92
11,744
6.91
169,946
100.00
    9
115,168
76.13
17,907
11.84
17,363
11.48
151,269
100.00
    10
109,257
77.01
24,984
17.61
6,526
4.60
141,866
100.00
    11
71,566
74.71
17,813
18.60
5,249
5.48
95,794
100.00
    12
66,598
79.36
9,665
11.52
7,124
8.49
83,924
100.00
    13
51,966
63.99
5,949
7.33
22,856
28.15
81,204
100.00
    14
47,926
69.46
10,850
15.72
9,526
13.81
68,999
100.00
    15
52,845
76.86
7,416
10.79
8,057
11.72
68,758
100.00
    16
54,605
80.03
7,320
10.73
6,076
8.90
68,234
100.00
    17
41,276
73.06
5,974
10.57
8,903
15.76
56,494
100.00
    Subtotal
2,741,794
70.38
470,821
12.09
657,861
16.89
3,895,741
100.00
    Other incorrect records
1,299,424
69.03
251,638
13.37
315,922
16.78
1,882,385
100.00
    Total incorrect records
4,041,218
69.94
722,459
12.50
973,783
16.85
5,778,134
100.00
    Total correct records
8,910,127
82.42
1,217,135
11.26
626,670
5.78
10,810,100
100.00
    Grand Total
12,951,345
78.08
1,939,594
11.69
1,600,453
9.65
16,588,234
100.00

(a) For description of response permutation represented by each type see Table 13, page 31.
* Includes multiple marks to Q18 Language.


Table 18 (overleaf) shows the number and proportion of people making common errors in the labour force questions who marked 'No', 'trade certificate' or 'other qualification' to Q24 Has the person obtained a trade certificate or any other educational qualification since leaving school?

Note that the table does include some people under 15 years of age because they could not be separately identified. The labour force study found however that at least 80% of people aged under 15 years did not go on to answer this question. They appear as stated responses if the respondent marked a category. Most people under 15 years appear in the Total column because they did not answer Q24 and were therefore given a 'not applicable' code. In main processing all respondents under 15 years of age will be allocated a 'not applicable' code for Q24.


Table 18: Incorrect labour force response permutations (a) by response to Q24 Qualification, Australia

    Type
No qualification
Trade qualification
Other qualification
Not stated
Total *
No.
%
No.
%
No.
%
No.
%
%
    1
479,814
42.92
198,134
17.72
316,828
28.34
34,489
3.09
1,117,892
100.00
    2
35,717
6.10
4,731
0.81
4,227
0.72
535,789
91.54
585,301
100.00
    3
79,072
30.59
1,470
0.57
1,138
0.44
7,608
2.94
258,453
100.00
    4
153,861
61.14
26,341
10.47
26,142
10.39
14,407
5.73
251,643
100.00
    5
144,066
60.97
20,858
8.83
23,411
9.91
13,761
5.82
236,280
100.00
    6
102,479
43.57
9,734
4.14
11,909
5.06
67,820
28.83
235,209
100.00
    7
135,688
60.45
19,400
8.64
27,058
12.05
13,463
6.00
224,469
100.00
    8
84,346
49.63
25,540
15.03
36,598
21.54
13,537
7.97
169,946
100.00
    9
89,320
59.05
8,077
5.34
10,603
7.01
25,028
16.55
151,269
100.00
    10
72,705
51.25
24,129
17.01
28,950
20.41
5,029
3.54
141,866
100.00
    11
52,255
54.55
11,605
12.11
20,253
21.14
3,803
3.97
95,794
100.00
    12
42,847
51.05
10,917
13.01
15,325
18.26
7,011
8.35
83,924
100.00
    13
24,690
30.40
2,580
3.18
2,505
3.08
49,327
60.74
81,204
100.00
    14
27,147
39.34
10,720
15.54
10,315
14.95
17,671
25.61
68,999
100.00
    15
32,245
46.90
9,619
13.99
14,062
20.45
7,782
11.32
68,758
100.00
    16
14,197
20.81
7
0.01
14
0.02
0
0.00
68,234
100.00
    17
24,115
42.69
6,908
12.23
9,476
16.78
12,536
22.19
56,485
100.00
    Subtotal
1,594,564
40.93
390,770
10.03
558,814
14.34
829,061
21.28
3,895,726
100.00
    Other incorrect records
749,204
39.80
137,132
7.28
173,063
9.19
419,319
22.28
1,882,408
100.00
    Total incorrect records
2,343,768
40.56
527,902
9.14
731,877
12.67
1,248,380
21.61
5,778,134
100.00
    Total correct records
4,370,144
40.43
996,773
9.22
1,516,211
14.03
412,037
3.81
10,810,099
100.00
    Grand Total
6,713,912
40.47
1,524,675
9.19
2,248,088
13.55
1,660,417
10.01
16,588,233
100.00

(a) For description of response permutation represented by each type see Table 13, page 31.
* Includes those still studying, multiple marks to Q24 and 'not applicable' codes.


The presence of records for people aged under 15 years in this table makes some cells more meaningful than others. The 'Total correct records' and 'Other incorrect records' categories include the largest number of persons aged less than 15 years. Of the identified incorrect response permutations Type 16 and Type 3 contain by far the highest number and proportion of persons aged less than 15 years. All Type 16 respondents are aged less than 15 years while almost 90% of Type 3 respondents are aged less than 15 years. Types 1, 3, 4, 5, 7, 11 and 14 also include a small number of persons aged less than 15 years but their inclusion should have little impact on the pattern of response shown to Q24 Qualification for these response permutations.

APPENDIX 3

LABOUR FORCE QUESTIONS CORRECT RESPONSE PERMUTATIONS

These response permutations were identified as part of the Labour Force Questions Error Analysis. Records that have been classified as correct are those where the respondent observed the labour force sequencing instructions correctly and answered each of the labour force questions required of them. Six correct response permutations have been identified and these are presented in the table below as response types 21, 22, 23, 24, 25 and 26. Note that three of these response permutations contain a not stated code to Q22 15 years or more. These have been treated as correct response permutations because although the respondent made an omission error in Q22 they completed the labour force questions correctly.


Table 19: Correct response permutations, Labour force questions, Australia

    Type
Q22Q30Q31Q32Q33Q39NumberProportion
15 yearsFull/part LookedJob lastHoursTravelof recordsof all records
or moretime jobfor workweekworkedto work%
    21
2E@SSS4,248,41425.61
    22
2DS@@@3,146,34818.97
    23
1@@@@@2,637,69715.90
    24
*@@@@@295,4521.78
    25
*DS@@@264,8021.60
    26
*E@SSS217,387 1.31
    Total correct responses
10,810,10065.17
    Total incorrect responses
5,778,13434.83
    Grand total
16,588,234100.00

Notes

*
        Not stated where a response was required.
@
        Not applicable - not stated where a response was not required.
1
        A response of less than 15 years to Q22.
2
        A response of 15 years or more to Q22.
E
        A response to category 1, 2 or 3 of Q30, suggesting the respondent is employed.
D
        A response to category 4 or 5 of Q30, suggesting the respondent did not have a job.
S
        A stated response to Q31, Q32, Q33 or Q39.



Table 19 shows 65.17% of all respondents in Australia correctly observed the sequencing instructions related to the labour force questions. Of these, around 40% indicated they were employed in Q30, just over 30% indicated they did not have a job in Q30 and just less than 30% were aged less than 15 years.

For the majority of respondents (99.60%) their response to Q4 Age was consistent with the way in which they answered the labour force questions (including Q22 if they answered it). As such, people indicating they were employed or without a job were typically aged 15 years or more and those not responding to the labour force questions were generally aged less than 15 years. Type 25 showed the least consistency, as 9.51% of the respondents were aged less than 15 years.

Table 20 (below) shows that most respondents indicating they were employed (Types 21 and 26) were aged 15-59 years. As would be expected, a far higher proportion of those indicating they did not have a job (Types 22 and 25) were aged 60 years or more. Those not responding to the labour force questions (Types 23 and 24) were nearly all aged less than 15 years.


Table 20: Correct response permutations by age of respondent, Labour force questions, Australia

Type
Age
0-14 years
15-59 years
60 years or more
Total
No.
%
No.
%
No.
%
No.
%
21
4,291
0.10
4,077,519
95.98
166,604
3.92
4,248,414
100.00
22
13,538
0.43
1,935,286
61.51
1,197,524
38.06
3,146,348
100.00
23
2,637,694
100.00
3
0.00
0
0.00
2,637,697
100.00
24
295,444
100.00
8
0.00
0
0.00
295,452
100.00
25
24,936
9.42
122,593
46.30
117,273
44.29
264,802
100.00
26
827
0.38
202,433
93.12
14,127
6.50
217,387
100.00


For persons aged 15 years or more (according to Q4) labour force status is derived using responses to questions 30, 31, 32, 33 and 39. 60.66% (7,833,359 persons) of persons aged 15 years or more observed the labour force sequencing instructions correctly. Of these, around 55% will be allocated a labour force status of employed and 45% will be allocated a labour force status of unemployed or not in the labour force.

APPENDIX 4

LABOUR FORCE QUESTIONS TREATMENT OF MULTIPLE MARKS

A total of 370,621 records, representing 2.23% of all records, contained multiple marks for at least one labour force variable. Most of the multiple marks were probably in the form of either crossed out mistakes or the deliberate marking of more than one response category, indicating confusion or misunderstanding on the part of the respondent. However a small proportion would have been due to OMR problems such as the coding of 'phantom' marks or stray matter as multiple marks. This appendix shows how multiple marks have been incorporated in both correct and incorrect response permutations.

As part of the Labour Force Questions Error Analysis, response permutations containing multiple marks which were similar to the permutations already identified were investigated. Some multiple marks were able to be treated as stated responses. Provided the response to Q30 is known, only information on whether subsequent questions were answered is needed to determine whether the sequencing instructions were followed correctly. This reflects the design of this part of the form. Table 11 below shows the number and proportion of records containing multiple marks able to be classified in this study.

43% of records with multiple marks were classified as correct responses. Some of the remaining incorrect permutations containing multiple marks were not able to be allocated to one of the common types of incorrect permutations already identified, and were regarded as part of the group of 'other incorrect responses'. Where multiple marks occurred in Q30 Full/part-time job it was impossible to determine whether a permutation was incorrect or correct. As there is no way of knowing what the case would be after editing, these records were treated as incorrect because it is unlikely that these respondents answered the labour force questions in the expected manner. The 'other incorrect records' category also includes records where the exact location of the multiple mark has not been identified. The value of further analysis of these records was considered to be limited.


Table 21: Multiple marks according to classification of labour force response permutations.

    Classification
Number
Percentage
    Correct
158,696
42.82
    Incorrect
47,349
12.78
    Other incorrect
    Identified
61,526
16.60
    Unidentified
103,050
27.80
    Total
164,576
44.40
    Total
370,621
100.00


Table 22 shows the exact location of multiple marks and indicates the type of identified response permutation to which these records were allocated.

Table 22: Multiple marks according to type of labour force response permutations (a)


Type
Q22
Q30
Q31
Q32
Q33
Q39
Number
Proportion of records
15 years
Full/part
Looked
Job last
Hours
Travel
of records
with multiple marks
or more
time job
for work
week
worked
to work
%
    Incorrect response permutations
1
2
E
S
S
M
S
13,367
3.61
M
E
S
S
S
S
7,704
2.08
2
E
S
M
S
S
7,469
2.02
2
E
M
S
S
S
6,225
1.68
4
2
D
M
@
S
S
4,671
1.26
5
2
D
M
@
S
@
3,927
1.06
7
2
D
M
@
@
S
3,986
1.07
Other
2
M
*
S
S
S
27,075
7.31
incorrect
2
M
S
S
S
S
21,428
5.78
records
2
M
S
*
*
*
13,023
3.51
Subtotal
108,875
29.38
      Other incorrect records not separately identified
103,050
27.80
      Total incorrect permutations with multiple marks
211,925
57.18

    Correct response permutations
21
2
E
@
S
M
S
39,788
10.74
M
E
@
S
S
S
23,235
6.27
2
E
@
M
S
S
18,032
4.87
2
E
@
M
M
S
674
0.18
M
E
@
S
M
S
480
0.13
22
2
D
M
@
@
@
42,702
11.52
M
D
S
@
@
@
21,395
5.77
23
M
@
@
@
@
@
4,804
1.30
25
*
D
M
@
@
@
3,351
0.90
26
*
E
@
S
M
S
1,389
0.37
*
E
@
M
S
S
2,790
0.75
*
E
@
M
M
S
56
0.02
      Total correct permutations with multiple marks
158,696
42.82
Grand Total
370, 621
100.00
(a) See Appendixes 2 and 3.

Notes
M
        Multiple mark.
*
        Not stated where a response was required.
@
        Not applicable - not stated where a response was not required.
1
        A response of less than 15 years to Q22.
2
        A response of 15 years or more to Q22.
E
        A response to category 1, 2 or 3 of Q30, suggesting the respondent is employed.
D
        A response to category 4 or 5 of Q30, suggesting the respondent is unemployed or not in the labour force.
S
        A stated response to Q31, Q32, Q33 or Q39.