Microdata: National Study of Mental Health and Wellbeing

Presents microdata from the National Study of Mental Health and Wellbeing for key mental health statistics including prevalence of mental disorders.

Introduction

The National Study of Mental Health and Wellbeing (NSMHW) is comprised of a survey that is collected on an irregular basis and is designed to provide a range of information about the mental health of Australians. It provides information on the prevalence of selected lifetime and 12-month mental disorders, by the major disorder groups:

  • Anxiety disorders (e.g., Social Phobia)
  • Affective disorders (e.g., Depression)
  • Substance Use disorders (e.g., Alcohol Harmful Use).

It also provides information on the level of impairment, and health services used for mental health problems.

This information can be cross classified by selected demographic and socioeconomic characteristics.

This product provides information about the 2021-22, 2020-21 and 2007 NSMHW collections. It includes details about data files, Data Item Lists, and information about the survey methodology. A link to microdata for the 1997 release is also provided.

Due to changes in survey content and the application of diagnostic criteria for mental disorders, some data are not comparable between collections. For more information, see Comparison between 2021-22, 2020-21, 2007 and 1997 below and 2020-2022 Methodology information.

Available products

The following microdata products are available from this survey:

  • Basic microdata – approved users can download and analyse unit record data in their own environment. This product is available for the 1997 and 2007 NSMHW. It is not available for the 2020-21, 2021-22, or 2020-2022 NSMHW.
  • Detailed microdata - approved users can access a remote desktop environment in DataLab for in-depth and interactive data analysis using a range of statistical software packages. This product is available for NSMHW reference periods: 2007, 2020-21, 2021-22, and 2020-2022.

To apply for access, see Microdata Entry Page.

Before you apply for access, read Responsible Use of ABS Microdata, User Guide.

File structure

Estimates from the 2021-22, 2020-21, and 2007 NSMHW are available at two levels contained in separate data files: Household and Selected Person.

A complete list of data items can be accessed from the Data Item List in the Data downloads section. This contains details for each data item including the output categories and any special codes used.

The 2020-21 and 2021-22 collections have also been pooled to create a 2020-2022 dataset.

2020-2022, 2021-22, 2020-21 and 2007 NSMHW file structure

The following table shows the levels available in the microdata product and the information contained on those levels:

File structure
Level nameInformation contained on level
HouseholdGeographic classifications, household size and structure, dwelling characteristics and household income details.
Selected PersonDemographic and socioeconomic characteristics of survey respondents, as well as health, mental health and related information provided by respondents.

The following table shows the hierarchical file structure and the relationship between each level:

Relationship between levels on files
Level 1Level 2Relationship type
Household One record per in scope household
 Selected PersonsOne selected person record per household

Counts and Weights

Number of records by level, NSMHW 2020-2022
LevelsRecord Counts (Unweighted)Weighted Counts
Household15,8939,897,838
Person (Selected persons)15,89319,828,348
Number of records by level, NSMHW 2021-22
LevelsRecord Counts (Unweighted)Weighted Counts
Household10,3399,897,838
Person (Selected persons)10,33919,828,348
Number of records by level, NSMHW 2020-21
LevelsRecord Counts (Unweighted)Weighted Counts
Household5,5549,787,651
Person (Selected persons)5,55419,644,025
Number of records by level, NSMHW 2007
LevelsRecord Counts (Unweighted)Weighted Counts
Household8,8418,159,637
Person (Selected persons)8,84116,015,345

Weight variables

For NSMHW, there are two weight variables on the file:

  • Household Weight (FINWTH) - Household level – Benchmarked
  • Person Weight (FINWTP) - Selected Person level - Benchmarked to population of persons 16-85 years

Using weights

The NSMHW is a sample survey, so to produce estimates for the in-scope population you must use weight fields in your calculations. When analysing a Household level item, you will need to use the household weight. When analysing a Selected Person level item, you will need to use the person weight.

Weights have been created for each sample. It is important to use the weights created for the sample you are using. In particular, the weights for the pooled 2020-2022 sample should not be used with the individual 2020-21 or 2021-22 samples. Estimates from the individual 2020-21 and 2021-22 samples will not match estimates derived from the pooled 2020-2022 sample. Sample counts and unperturbed weighted estimates are included in the National Study of Mental Health and Wellbeing methodology for 2020-2022, 2021-22 and 2020-21 to assist you in validating your estimates.

File content

Available data items

Data items for 2020-2022 include:

  • Demographics including age, sex at birth, gender, variations of sex characteristics, and sexual orientation, country of birth, main language spoken, and marital status
  • Household details including household composition, tenure type, landlord type, number of bedrooms, and household income
  • Socio-economic characteristics of people including labour force status, educational attainment, and personal income
  • General health and wellbeing including self-assessed health status, psychological distress, smoking, long term health conditions, social connectedness, and functioning
  • Mental disorders including depression, mania, panic, social phobia, agoraphobia, generalised anxiety, substance use, obsessive-compulsive disorder, post-traumatic stress disorder
  • Suicidality
  • Self-harm
  • Disordered eating
  • Use of health and social support services

The Data Item Lists contain a full list of available data items and categories for the 2020-2022, 2021-22, 2020-21 and 2007 collections.

Identifiers

Every record on each level of the file is uniquely identified. See Data Item Lists for details on which ID equates to which level.

Each household has a unique random identifier, ABSHID. This identifier appears on the household level and is repeated on the selected person level. The combination of identifiers uniquely identifies a record at a particular level as shown below.

  1. Household = ABSHID
  2. Person = ABSHID + ABSPID

The Household record identifier, ABSHID, assists with linking people from the same household, and with household characteristics such as geography (located on the household level) to the Person records.

Multi-response items

Several questions in the survey allowed respondents to provide one or more responses. Each response category for these multi-response data items is treated as a separate data item. In the detailed microdata, these data items share the same identifier (SAS name) prefix but are each separately suffixed with a letter - A for the first response, B for the second response, C for the third response and so on. Where there are more than 26 categories, the next suffix after Z is 0, then 1, 2, etc.

For example, the multi-response data item 'Long-term Health Condition' has thirteen response categories (including 'No long-term health conditions'). There are thirteen data items named LTHCONDA, LTHCONDB, LTHCONDC...LTHCONDM. Each data item in the series will have either a positive response code or a null response code, with the exception of the first item in the series, LTHCONDA. LTHCONDA has three potential response codes: the positive response code 10 - 'Arthritis', the code 0 - null response, as well as the additional response codes, code 98 - 'Not known' and code 99 – ‘Refused’. The remaining items LTHCONDB--M have just the two response codes each. The data item list identifies all multi-response items and lists the corresponding codes with the corresponding response categories.

Note that the sum of individual multi-response categories will be greater than the population applicable to a particular data item as respondents can select more than one response.

Continuous items

Some continuous data items are allocated special codes for certain responses (e.g., 9997 = 'Not applicable'). Any special codes for continuous (summation) data items are listed in the Data Item List and will be found in the categorical version of the continuous item.

Reliability of estimates

As the survey was conducted on a sample of private households in Australia, it is important to take account of the method of sample selection when deriving estimates from the detailed microdata. This is important because a person's chance of selection in the survey varied depending on the state or territory in which the person lived. If these chances of selection are not accounted for by use of appropriate weights, the results could be biased.

Each person or household record has a main weight (FINWTP or FINWTH). This weight indicates how many population units are represented by the sample unit. When producing estimates of sub-populations from the detailed microdata, it is essential that they are calculated by adding the weights of persons or households in each category and not just by counting the sample number in each category. If each person or household’s weight were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a person or household's chance of selection or of different response rates across population groups, with the result that the estimates produced could be biased. The application of weights ensures that estimates will conform to an independently estimated distribution of the population by age, by sex, etc. rather than to the distributions within the sample itself.

It is also important to calculate a measure of sampling error for each estimate.  Sampling error occurs because only part of the population is surveyed to represent the whole population.  Sampling error should be considered when interpreting estimates as this gives an indication of accuracy and reflects the importance that can be placed on interpretations using the estimate. Measures of sampling error include standard error (SE), relative standard error (RSE) and margin of errors (MoE).  These measures of sampling error can be estimated using the replicate weights. The replicate weight variables provided on the microdata are labelled WPMXX (person) and WHMXX (household), where XX represents the number of the given replicate group. The exact number of replicates will vary depending on the survey but will generally be 30, 60 or 200 replicate groups. As an example, for survey microdata with 60 replicate groups, you will find 60 person replicate weight variables labelled WPM01 to WPM60.

Using replicate weights for estimating sampling error

Overview of replication methods

ABS household surveys employ complex sample designs and weighting which require special methods for estimating the variance of survey statistics.  Variance estimators for a simple random sample are not appropriate for this survey microdata.

A class of techniques called 'replication methods' provide a general process for estimating variance for the types of complex sample designs and weighting procedures employed in ABS household surveys. The ABS uses a method called the Group Jackknife Replication Method. 

A basic idea behind the replication approach is to split the sample into G replicate groups. One replicate group is then dropped from the file and a new set of weights is produced for the remaining sample. This is repeated for all G replicate groups to provide G sets of replicate weights. For each set of replicate weights, the statistic of interest is recalculated and the variance of the full sample statistic is estimated using the variability among the replicate statistics.

The statistics calculated from these replicates are called replicate estimates. Replicate weights provided on the microdata file enable variance of survey statistics, such as means and medians, to be calculated relatively simply (Further technical explanation can be found in Section 4 of Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee).

How to use replicate weights

To calculate the standard error of any statistic derived from the survey data, the method is as follows:

  1. Calculate the estimate of the statistic of interest using the main weight
  2. Repeat the calculation above for each replicate weight, substituting the replicate weight for the main weight and creating G replicate estimates.  In the example where there are 60 replicate weights, you will have 60 replicate estimates. 
  3. Use the outputs from step 1 and 2 as inputs to the formula below to calculate the estimate of the Standard Error (SE) for the statistic of interest.

\(SE(y) = \sqrt{\frac{G-1}{G} \sum_{g=1}^G (y_{(g)} - y)^2}\)

[Equation 1]

  • G = Number of replicate groups
  • g = the replicate group number
  • \(y_{(g)}\) = Replicate estimate for group g, i.e. the estimate of y calculated using the replicate weight for g
  • y = the weighted estimate of y from the sample

From the replicate variance you can then derive the following measures of sampling error: relative standard error (RSE), or margin of error (MOE) of the estimate.

\(\text{Relative Standard Error (RSE)} = \frac{\text{SE}}{\text{Estimate}} \)

[Equation 2]

\(\text{Margin of Error (MoE)} = 1.96 \times \text{SE} \)

[Equation 3]

An example in calculating the SE for an estimate of the mean

Suppose you are calculating the mean value of earnings, y, in a sample.  Using the main weight produces an estimate of $500.

You have 5 sets of Group Jackknife replicate weights and using these weights (instead of the main weight) you calculate 5 replicate estimates of $510, $490, $505, $503, $498 respectively. 

To calculate the standard error of the estimate you will substitute the following inputs to equation [1]

G = 5

y = 500

g = 1, y(g) = 510

g = 2, y(g) = 490

\(SE(y) = \sqrt{\frac{5-1}{5} \sum_{g=1}^5 (y_{(g)}- 500)^2} \)

\(SE(y) = \sqrt{\frac{4}{5}((510-500)^2+(490-500)^2+(505-500)^2+(503-500)^2+(498-500)^2)} \)

\(SE(y)= \sqrt{\frac{4}{5} \times 238}\)

\(SE(y)= 13.8\)

To calculate the RSE you divide the SE by the estimate of y ($500) and multiply by 100 to get a %

  • \(RSE(y)=\frac{13.8}{500} \times 100\)
  • \(RSE(y)=2.8\%\)

To calculate the margin of error you multiply the SE by 1.96

  • \(\text {Margin of Error} (y)=13.8 \times 1.96\)
  • \(\text {Margin of Error} (y)=27.05\)

Comparison between 2021-22, 2020-21, 2007, and 1997

Data from the 2021-22 and 2020-21 surveys has been released as the National Study of Mental Health and Wellbeing. The ABS also conducted this survey in 2007 and 1997. Data from the 2007 survey was released as the National Survey of Mental Health and Wellbeing and data from the 1997 survey was released as the National Survey of Mental Health and Wellbeing of Adults.

Comparison between 2021-22 and 2020-21

The Country of birth of father and Country of birth of mother data items were only collected in 2021-22.

All other data items collected in 2021-22 were collected in 2020-21 and are comparable. Further information on comparability between these two collections and the 2007 collection can be found below. The information references the 2020-2022 NSMHW pooled dataset but also applies to the two individual collections. 

Comparison between 2020-2022 and 2007

The 2020-2022 NSMHW was designed to be broadly comparable with the 2007 survey. The 2020-2022 collection used the World Mental Health Survey Initiative version of the World Health Organization's (WHO) Composite International Diagnostic Interview, version 3.0 (WMH-CIDI 3.0) which was used in 2007. It used the WMH-CIDI 3.0 questionnaire modules used in 2007 and collected them in the same order as they were collected in 2007. Data collected using the WMH-CIDI 3.0 modules are therefore comparable between 2020-2022, and 2007.

Many of the non-diagnostic topics and the order in which they were collected in 2020-2022 differs from that in 2007. Some topics collected in the 2007 survey were removed and new topics were added. Other topics changed significantly between 2020-2022 and 2007. For example, demographic and socio-economic modules were updated to align with current ABS standards and commonly used ABS questions and data items. Data for non-diagnostic topics may not be comparable between 2020-2022 and 2007.

Please see the Data Item Lists for each collection for full details.

Due to the change in questions used to collect physical health conditions in 2020-2022, the comorbidity of mental health disorders and physical health conditions is not comparable with 2007.

2020-2022 DSM-IV Anxiety Disorders include Agoraphobia with/without Panic Disorder rather than Agoraphobia without Panic Disorder which was included in 2007. This also impacts all dependent data items, for example DSM-IV Any Mental Health Disorder.

The diagnoses of mental disorders are based on the WMH-CIDI 3.0 algorithms. The algorithms operationalise criteria from two classification systems: the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV); and the WHO International Classification of Diseases, Tenth Revision (ICD-10).

The version of the algorithms used for the 2020-2022 NSMHW was provided by the WHO in 2020. The algorithms are comparable with the version used for the 2007 survey with the following exceptions:

ICD-10 Post-Traumatic Stress Disorder (PTSD):

  • ICD criteria B Part 2 has been updated: Group 2 reactions (unwanted memories, unpleasant dreams, flashbacks, getting very upset when reminded of it, physical reactions) must have occurred at least once a month. The version of the diagnostic algorithms used for the 2007 survey did not include the once-a-month persistence criterion.
  • ICD criteria D Part 2 has been updated: Persistent symptoms of increased psychological sensitivity and arousal shown by any two of the following: difficulty in falling or staying asleep, irritability or outbursts of anger, difficulty in concentrating, hypervigilance, exaggerated startle response; not present before exposure to the stressor, and must have occurred at least once a month. The version of the diagnostic algorithms used for the 2007 survey did not include the once-a-month persistence criterion.
  • Lifetime and 12-month prevalence data items for ICD-10 PTSD are therefore not comparable between 2020-2022 and 2007.

ICD-10 Obsessive-Compulsive Disorder (OCD):

  • For an ICD-10 lifetime diagnosis of OCD, obsessions and/or compulsions must be present on most days for at least two weeks. In 2007, the 12-month diagnosis was derived from the lifetime diagnosis including the criterion that disorder symptoms must have been present on most days for at least two weeks or longer in the 12 months prior to the survey interview. The version of the algorithms used for the 2020-2022 survey did not include the two-week persistence as a condition for meeting 12-month diagnosis. 12-month diagnosis in 2020-2022 is derived based on lifetime OCD diagnosis with the presence of OCD symptoms, for any duration, in the past 12 months.
  • 12-month prevalence data items for ICD-10 OCD are therefore not comparable between 2020-2022 and 2007.

Both Post-Traumatic Stress Disorder and Obsessive-Compulsive Disorder are classified as Anxiety disorders. Consequently, the ICD-10 lifetime and 12-month Anxiety disorders data items and the ICD-10 lifetime and 12-month Mental disorders data items are also not comparable between 2020-2022 and 2007.

2007 re-derived data items:

To enable comparison between 2020-2022 and 2007, selected ICD-10 Post-Traumatic Stress Disorder, ICD-10 Obsessive-Compulsive Disorder, ICD-10 Anxiety disorders, and ICD-10 Mental disorders data items, as well as the associated ICD-10 comorbidity and severity data items, have been re-derived using the 2020 definitions. These have been added to the 2007 detailed microdata file (see the Re-derived Items tab of the Data Item List). Estimates produced using these items will not match those included in the National Survey of Mental Health and Wellbeing, 2007 Summary of Results.

Comparison between 2007 and 1997

The 2007 survey was designed to provide national estimates that can be compared internationally, rather than to provide comparisons with the 1997 survey. Due to differences in how the data were collected, care should be exercised when comparing data items from the 1997 survey with the 2007 survey. Particular attention should be given to the definition of the data item, the population, and the reference period that applies (e.g., 12-month versus lifetime). Differences between the two surveys are too substantial to list individually, but included changes to questions and topics, concepts, survey methodology, classifications, and measurements.

Detailed information on the differences between the two surveys is provided in the National Survey of Mental Health and Wellbeing: Users' Guide, 2007.

Data downloads

Data Item Lists

Data files

Previous releases

 

 

MicrodataDownload

DataLab

National Survey of Mental Health and Wellbeing, 2007

Basic microdata

Detailed microdata

Mental Health and Wellbeing of Adults, 1997

Basic microdata

 

Further information

History of changes

12 December 2023

A non-response adjustment has been applied to the 2020-21 (Cohort 1) dataset which is now available in the DataLab. This adjustment had already been applied to the 2021-22 (Cohort 2) and 2020-2022 datasets. Information pertaining to the Cohort 1 dataset has been added to the Methodology Section of the 2020-2022 National Study of Mental Health and Wellbeing Summary Publication.

In addition, the Data Item List for 2020-21 has been updated.

Back to top of the page