Data Processing and Coding
Introduction
Data processing procedures and checks are primarily designed to check data provided and to correct, where possible, any inconsistencies in the data.
Input coding
Input coding is the process by which certain data items were categorised during the interview. In the 2021-22 PSS, computer-assisted input coding was performed on the following data items:
- country of birth of all household members
- country of birth of respondent’s mother and father
- first language spoken as a child and main language spoken at home for the respondent
- highest level of non-school qualification and, if applicable, level of current study for the respondent
- relationships within the household
- visa type
- perpetrator type (for Violence and Stalking topics).
Interviewers were able to code from a list of the most common response categories (e.g. the ten most common languages spoken at home) or from a more comprehensive list contained within a 'trigram coder' (which allowed the interviewer to enter the first three letters of a response, then select the appropriate response from a pick list of options). Trigram coders are used to aid the interviewer with the collection of data for which there are detailed lists of output – primarily those associated with Standard Classifications – to eliminate the need for significant office coding. The trigram coders are complemented by manual coding of text fields in the event interviewers could not find an appropriate response amongst the commonly used options or via the trigram coder.
The following coders were utilised in the processing of the survey:
- Country of birth of respondent, their mother and father, and their current partner who they live with (where applicable) – Countries were classified according to the Standard Australian Classification of Countries (SACC), 2016.
- First language spoken as a child and main language spoken at home for the respondent – Languages were classified according to the Australian Standard Classification of Language (ASCL), 2016.
- Educational qualification – Level and field of highest non-school educational qualification and level and field of current study of respondent (where applicable) were coded to the Australian Standard Classification of Education (ASCED), 2001.
- Area data – Capital city, balance of state/territory, and remoteness areas are classified according to the Australian Statistical Geography Standard (ASGS): Volume 1 - Main Structure and Greater Capital City Areas, July 2016, Australian Statistical Geography Standard (ASGS): Volume 4 - Significant Urban Areas, Urban Centres and Localities, Section of State, July 2016, and Australian Statistical Geography Standard (ASGS): Volume 5 - Remoteness Structure, July 2016.
For more details on the ABS Standard Classifications used in the PSS, refer to the Classifications and Standards chapter of this publication.
Further information about the response categories available for each of the data items that use the ABS Standard Classifications can be found in the data item list available under Downloads.
Coding of free-form text responses
A small number of questions in the 2021-22 PSS contained an ‘Other’ response category as part of a pick list, which if selected then sequenced to a free-form text field for the recording of further details. These fields include:
- other payment period (income)
- other term used to describe sexual orientation (sexual orientation)
- other known perpetrator types (sexual harassment)
- other location behaviour happened (sexual harassment)
- other method by which sexual harassment perpetrated (sexual harassment)
- other relationship to known perpetrator(s) of abuse before the age of 15 (sexual and physical abuse)
- other reason for temporarily separating from current/previous partner
- other reason separated the last time from previous partner
- other places stayed during separations from current/previous partner
- other place stayed on first night after separation from current/previous partner
- other places stayed when relationship finally ended with previous partner
- other reaction experienced as a result of unwanted contact or attention (stalking).
For the coding of these categories, office staff assessed whether or not it was possible to re-code the stated response into an existing response category from the original question. Where this was possible, responses were manually re-coded. Otherwise, they were left in the ‘other’ category.
Edit checks
During office processing of the data, checks were performed on records to ensure that specific values lay within valid ranges, and relationships between items were within limits deemed acceptable for the purposes of the survey. These checks were also designed to detect errors which may have occurred during response entry and processing, and to identify cases which, although not necessarily errors, were sufficiently unusual or unexpected to warrant examination.
Data available from the survey are essentially ‘as reported’ by respondents. In some cases, it was possible to correct any identified errors or inconsistencies in the originally recorded data through reference to other data available in the same record, including interviewer comments. In other cases this was not always possible, and some minor errors and inconsistencies may remain on the data file. Wherever possible, known inconsistencies and irregularities are identified in the interpretation section of the relevant topic chapters in this publication.
Validation checks
The output data file was extensively validated through an item-by-item examination of input and output frequencies, checking populations through derivations, checking the internal consistency of items within and between different levels of the data file, and confrontation with results from previous Personal Safety Surveys. Despite these checks, it is still possible that some small errors remain on the data file.
Output datasets
Information from the survey is stored electronically in the form of data items. In some cases, items were formed directly from individual survey questions, while in others, items were derived from answers to multiple questions.
Only data from respondents who completed both the compulsory and voluntary content are retained on the final weighted file.
The output datasets from the 2021-22 PSS are hierarchical in nature and contain six different levels. A hierarchical file is an efficient means of storing and retrieving information which describes one to many, or many to many, relationships.
The structure of the 2021-22 PSS output datasets are as follows:
The top levels include:
- Household level – contains compositional and geographic information about the household, and household income.
- Person level (linked to household level) – contains socio-demographic information about the respondent and (if applicable) their current partner (who they are living with) including income, education, labour force and language information, as well as information about the respondent’s general feelings of safety, self-assessed health status, visa status, defence force service, housing, disability status, sexual orientation, social connectedness, and experiences of: sexual harassment, sexual or physical abuse before the age of 15, witnessing violence before the age of 15, economic abuse by a partner, and stalking. The person level also contains a significant number of aggregated data items produced from data contained on the levels outlined below. These aggregated data items provide only summary experience data (predominantly used to produce prevalence rates), with detailed information remaining on the lower topic-focused levels. For more details on these topics, refer to the relevant topic chapters contained within this publication.
Beneath the person level, there are four further levels (linked to the person level):
- Violence prevalence level – contains information about a respondent's experience of violence since the age of 15. The time frame of the most recent incident experienced by broad groupings of perpetrator type is available on this level for each of the 8 violence types collected, as well as for aggregated violence types. In addition, a detailed perpetrator type data item is available for use with the violence type data. For more details on this level, refer to the Violence - Prevalence chapter in this publication.
- Violence most recent incident level – contains detailed characteristics about a respondent's most recent incident (in the last 10 years) of up to 7 types of violence: physical/sexual assault by a male/female perpetrator, physical threat by a male/female perpetrator, and sexual threat by a male perpetrator (female respondents only). For more details on this level, refer to the Violence - Most Recent Incident chapter in this publication.
- Partner violence level – contains detailed information about a respondent's experience of violence by a current partner and/or most recently violent previous partner since the age of 15. For more details on this level, refer to the Partner Violence chapter in this publication.
- Partner emotional abuse level – contains information about a respondent's experience of emotional abuse by a current and/or most recent emotionally abusive previous partner since the age of 15. For more details on this level, refer to the Partner Emotional Abuse chapter in this publication.
A comprehensive list of data items available on each level described above is available under Downloads.