Microdata,TableBuilder: National Aboriginal and Torres Strait Islander Health Survey, Australia

Introduction

The National Aboriginal and Torres Strait Islander Health Survey (NATSIHS) collects information on the health and wellbeing of Aboriginal and Torres Strait Islander people. It provides data such as the prevalence of chronic and long-term health conditions, self-reported health status and health risk factors such as smoking, alcohol consumption, fruit and vegetable consumption, and physical activity.

The 2022–23 NATSIHS is the fifth health survey of Aboriginal and Torres Strait Islander people conducted by the ABS. See National Aboriginal and Torres Strait Islander Health Survey, 2022–23 for summary results, methodology and other information.

This product provides information about the microdata releases from the most recent NATSIHS cycles, 2022–23 and 2018–19, including details about the data files and how to use the different microdata products. Data Item Lists, information about the survey methodology, and links to microdata for previous NATSIHS releases are also provided.

Most data in the 2022–23 NATSIHS are considered comparable with the 2018–19 NATSIHS. The Data Item List contains information on the availability and comparability of data items between surveys.

Available products

TableBuilder – an online tool for creating tables and graphs. This product is available for the 2018–19 NATSIHS and the 2012–13 NATSIHS.
Detailed microdata – approved users can access DataLab for in-depth and interactive data analysis using a range of statistical software packages. This product is available for all iterations of the NATSIHS.
Basic microdata – approved users can download and analyse unit record data in their own environment. This product is available for the 2012–13 NATSIHS Nutrition and Physical activity data only.

Compare access options to see what's right for you or Apply for access.

File structure

Datasets from the NATSIHS are hierarchical in nature. A hierarchical data file is a way of presenting information which describes one-to-many, or many-to-many, relationships. For example, a person may report multiple days on which alcohol was consumed and also multiple types of alcoholic beverages consumed on each of these days.

Most data are contained as individual characteristics on person records. Estimates are also available at the household level. The data items and related output categories are described in Excel spreadsheets from the Data Item Lists section.

2022–23 NATSIHS file structure

2022–23 NATSIHS level names and information contained on each level of the microdata
Level name	Information contained on level
Household level	Geographic classifications, household size and structure, dwelling characteristics and household income details.
Selected person level	Demographic and socioeconomic characteristics of survey respondents, as well as health, health risk factors and related information.
Alcohol day level	Total daily alcohol consumption of the 3 most recent days of alcohol consumption in the last week.
Alcohol type level	Types of alcoholic beverages consumed, including quantities, on the 3 most recent days of alcohol consumption in the last week.
Alcohol single occasion level	Types of alcoholic beverages consumed and quantity of alcohol consumed on the day of highest alcohol intake in the last 2 weeks.
Conditions level	Health conditions and status.
Medications level	Medications or supplements that have been taken in the last 2 weeks.
Physical activity (non-remote) day level	Types of daily physical activity and duration undertaken in the last week.
Barriers level	Barriers experienced when trying to access service providers.

2022–23 NATSIHS hierarchical file structure and the relationship between each level
Level 1	Level 2	Level 3	Level 4	Relationship type
Household				One record per in-scope household
	Selected Persons			Contains one record for each person that has been selected for the survey: Non-remote: Up to 4 selected person records per household (2 adults and 2 children) Remote: Up to 2 selected persons per household (1 adult and 1 child)
		Alcohol day		Up to 3 records – one for each day that alcohol was consumed – per selected person aged 15 years and older
			Alcohol type	Up to 17 records – one for each type of alcoholic beverage consumed – per alcohol day record
		Alcohol single occasion type		Up to 17 records – one for each type of alcoholic beverage consumed on the highest alcohol intake day within the last two weeks – per selected person aged 15 years and older
		Conditions		Up to 119 records – one for each reported condition – per selected person
		Medications		Up to 51 records – one for each medication or supplement taken in the last 2 weeks – per selected person in non-remote areas
		Physical activity (non-remote) day		Seven records per selected person in non-remote areas aged 15 years and over
		Barriers		Up to 3 records – one for each service provider that the respondent has experienced a barrier accessing – per selected person

2018–19 NATSIHS file structure

2018–19 NATSIHS level names and information contained on each level of the microdata
Level name	Information contained on level
Household	Geographic classifications, household size and structure, dwelling characteristics and household income details
Selected person	Demographic and socio-economic characteristics of survey respondents, and most of the health, health risk factors and related information they provided
Alcohol – day consumed	Alcohol consumption on the 3 most recent days on which respondents reported consuming alcohol and the order of consumption
Alcohol – type consumed	Order of consumption, and the broad alcohol types and quantities for each type consumed on those days
Binge – type consumed	The broad alcohol types and quantities for each type consumed on the day the respondent consumed the most alcohol in the last 2 weeks
Conditions	Information about health conditions reported by respondents
Medications	Information on medications reported by respondents

Counts and weights

Number of records by level, 2022–23 NATSIHS microdata
Level	Record counts (unweighted)	Weighted counts (if applicable)
Household	4,878	479,752
Selected persons	7,768	993,967
Alcohol day	6,802	n/a
Alcohol type	7,189	n/a
Alcohol single occasion type	5,590	n/a
Conditions	27,728	n/a
Medications	11,972	n/a
Physical activity	24,052	n/a
Barriers	6,673	n/a

Number of records by level, 2018–19 NATSIHS microdata
Level	Record counts (unweighted)	Weighted counts (if applicable)
Household	6,388	352,169
Selected person	10,579	814,244
Alcohol – day consumed	8,992	n/a
Alcohol – type consumed	9,467	n/a
Binge – type consumed	7,433	n/a
Conditions	38,197	n/a
Medications	18,102	n/a
Hearing	10,579	814,244

Weight variables

For the 2018–19 NATSIHS and the 2022–23 NATSIHS there are two main weight variables on the file:

Household weight (FINHHWT) – Household level – benchmarked to produce Aboriginal and Torres Strait Islander household estimates.
Person weight (FINPERWT) – Selected person level – benchmarked to the total Aboriginal and Torres Strait Islander population.

For the 2018–19 NATSIHS, there is an additional weight variable on the file for the Hearing level:

Hearing test weight (FINHEAWT) – must be used whenever an item is used from the hearing level. The weighted counts for the 2018–19 selected person level and the hearing level are the same. This is because a maximum of one hearing test was counted for each person. However, the hearing weight must be used whenever an item is used from the hearing level.

There is no weight associated with the other levels for the NATSIHS. This is because the records are repeated for each person. If, for example, FINPERWT is merged onto the Conditions level, it will be attached to each condition record and therefore be repeated for each person where they have more than one condition. This should be considered when producing tables.

Using weights

The NATSIHS is a sample survey, so to produce estimates for the in-scope population you must use weight fields in your calculations. When analysing a Household level item at the household level, you will need to use the household weight. For example, if you wanted to know the number of households in a state, rather than the number of persons living in that state.

Caution should be used when applying the Household weight to items from other levels. For example, if the household weight is applied to a selected person level demographic item, such as ‘Sex’, your table will show the number of households with one or more selected persons of that sex. Since up to four people can be selected in the NATSIHS, this will result in some households being counted up to four times, twice for the selected adult and twice for the selected child, if they are all the same sex.

File content

Available data items

Data items for the 2022–23 NATSIHS include:

demographics – age, sex, language, social marital status
household details – type, size, household composition, tenure, Socio-Economic Indexes for Areas (SEIFA), geography
food security and financial stress
cultural identification
labour force status
educational attainment
personal and household income
personal use of the internet
self-assessed health status
self-reported height and weight
long-term health conditions such as arthritis, asthma, cancer, diabetes, hypertension, kidney disease, mental and behavioural conditions
health risk factors such as smoking, alcohol consumption, fruit and vegetable consumption, physical activity
social and emotional wellbeing
medications
health actions such as use of health services, barriers to accessing health services, and cultural safety when using some of these services
physical measurements – blood pressure, height, weight, and waist circumference.

The Data Item Lists section is the definitive source of available data items and categories. The detailed index within the 2022–23 NATSIHS Data Item List contains information on the comparability of data items to the 2018–19 NATSIHS. Additionally, each data item within the Data Item List has information about any changes between NATSIHS cycles.

Identifiers

Every record on each level of the file is uniquely identified. See Data Item Lists for details on which ID equates to which level.

Each household has a unique random identifier, ABSHIDD. This identifier appears on the household level and is repeated on each level on each record pertaining to that household. A combination of identifiers for a particular level and all levels above in the hierarchical structure uniquely identifies a record at a particular level. For example, each record on the conditions level is uniquely identified by a combination of the Household, Person and Conditions level identifiers.

The Household record identifier, ABSHIDD, assists with linking people from the same household, and with household characteristics such as geography (located on the household level) to the Person records. When merging data with a level above, only those identifiers relevant to the level above are required.

Multi-response items

Several questions in the survey allowed respondents to provide one or more responses. Each response category for these multi-response data items is treated as a separate data item. In the detailed microdata, these data items share the same identifier (SAS name) prefix but are each separately suffixed with a letter – A for the first response, B for the second response, C for the third response and so on.

For example, the multi-response data item 'All types of physical activity undertaken in last week' (PATYPEW) has seven response categories. There are seven data items named PATYPEWA, PATYPEWB, PATYPEWC....PATYPEWG. Each data item in the series will have either a positive response code or a null response code, with the exception of the first item in the series, PATYPEWA.

PATYPEWA has four potential response codes:

code 1 – 'Walking for exercise, recreation and sport' – positive response
code 0 – null response
code 8 – 'No physical activity in last week'
code 9 – 'Not applicable'.

The remaining items PATYPEWB, PATYPEWC....PATYPEWG have just two response codes each. The Data Item List identifies all multi-response items and lists the corresponding codes with the corresponding response categories.

Note that the sum of individual multi-response categories will be greater than the population applicable to a particular data item as respondents can select more than one response.

Continuous items

Some continuous data items are allocated special codes for certain responses (e.g. 9999 = 'Not applicable'). When creating ranges for such continuous items for use in the TableBuilder, these special codes will NOT be included in these ranges. Any special codes for continuous (summation) data items are listed in the Data Item List and will be found in the categorical version of the continuous item. However, note that labelling of '0's in the Data Item List does not necessarily mean they are excluded from the ranges (for example – identifying 0 as 'Did not visit' or 'Did not do') as they may still be important in some calculations. Reference should be made to the categorical version of the item to identify which codes are specifically excluded. Therefore the total shown only represents 'valid responses' of that continuous data item rather than all responses (including special codes).

Reliability of estimates

As the survey was conducted on a sample of private households in Australia, it is important to take account of the method of sample selection when deriving estimates from the detailed microdata. A screening process is used in non-community areas to identify in-scope households. For both the community and non-community sample, a person’s chance of selection varied depending on the state or territory in which the person lived. For more information on sampling see the Methodology section. If the chances of selection are not accounted for by use of appropriate weights, the results could be biased.

Each person or household record has a main weight (FINHHWT or FINPERWT). This weight indicates how many population units are represented by the sample unit. When producing estimates of sub-populations from the detailed microdata, it is essential that they are calculated by adding the weights of persons or households in each category and not just by counting the sample number in each category. If each person or household’s weight were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a person or household's chance of selection or of different response rates across population groups, with the result that the estimates produced could be biased. The application of weights ensures that estimates will conform to an independently estimated distribution of the population by age, by sex, etc. rather than to the distributions within the sample itself.

It is also important to calculate a measure of sampling error for each estimate. Sampling error occurs because only part of the population is surveyed to represent the whole population. Sampling error should be considered when interpreting estimates as this gives an indication of accuracy and reflects the importance that can be placed on interpretations using the estimate. Measures of sampling error include standard error (SE), relative standard error (RSE) and margin of errors (MoE). These measures of sampling error can be estimated using the replicate weights. The replicate weight variables provided on the microdata are labelled WPM1XXX (person) and WHM1XXX (household), where XXX represents the number of the given replicate group. The exact number of replicates will vary depending on the survey. The NATSIHS generally uses 250 replicate groups for both household and person weights, so you will find, for example, 250 person replicate weight variables labelled WPM1001 to WPM1250.

Using replicate weights for estimating sampling error

ABS household surveys employ complex sample designs and weighting which require special methods for estimating the variance of survey statistics. Variance estimators for a simple random sample are not appropriate for this survey microdata.

A class of techniques called 'replication methods' provide a general process for estimating variance for the types of complex sample designs and weighting procedures employed in ABS household surveys. The ABS uses a method called the Group Jackknife Replication Method.

A basic idea behind the replication approach is to split the sample into G replicate groups. One replicate group is then dropped from the file and a new set of weights is produced for the remaining sample. This is repeated for all G replicate groups to provide G sets of replicate weights. For each set of replicate weights, the statistic of interest is recalculated and the variance of the full sample statistic is estimated using the variability among the replicate statistics.

The statistics calculated from these replicates are called replicate estimates. Replicate weights provided on the microdata file enable variance of survey statistics, such as means and medians, to be calculated relatively simply. Further technical explanation can be found in Section 4 of Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee.

How to use replicate weights

To calculate the standard error of any statistic derived from the survey data, the method is as follows:

Calculate the estimate of the statistic of interest using the main weight.
Repeat the calculation above for each replicate weight, substituting the replicate weight for the main weight and creating G replicate estimates. In the example where there are 60 replicate weights, you will have 60 replicate estimates.
Use the outputs from step 1 and 2 as inputs to the formula below to calculate the estimate of the Standard Error (SE) for the statistic of interest.

$\normalsize SE (y)=\sqrt{\frac{G-1}{G} \sum_{g=1}^{G}(y_{(g)}-y)^{2}}$

[Equation 1]

$G$ = Number of replicate groups
$g$ = the replicate group number
$y_{(g)}$ = Replicate estimate for group g, i.e. the estimate of y calculated using the replicate weight for g
$y$ = the weighted estimate of y from the sample

From the replicate variance you can then derive the following measures of sampling error: relative standard error (RSE), or margin of error (MoE) of the estimate.

$\text{Relative Standard Error (RSE)} = \frac{\text{SE}}{\text{Estimate}}$

[Equation 2]

$\text{Margin of Error (MoE)} = 1.96 \times \text{SE}$

[Equation 3]

An example in calculating the SE for an estimate of the mean

Suppose you are calculating the mean value of earnings, y, in a sample. Using the main weight produces an estimate of $500.

You have 5 sets of Group Jackknife replicate weights and using these weights (instead of the main weight) you calculate 5 replicate estimates of $510, $490, $505, $503, $498 respectively.

To calculate the standard error of the estimate you will substitute the following inputs to equation [1]

$G$ = 5
$y$ = 500
$g$ = 1, $y_{(g)}$ = 510
$g$ = 2, $y_{(g)}$ = 490
…

$\normalsize SE (y)=\sqrt{\frac{5-1}{5} \sum_{g=1}^{5}(y_{(g)}-500)^{2}}$

$\normalsize SE (y)=\sqrt{\frac{4}{5} ((510-500)^{2} + (490-500)^{2} + (505-500)^{2} + (503-500)^{2} + (498-500)^{2}})$

$\normalsize SE (y)=\sqrt{\frac{4}{5} \times 238}$

$\normalsize SE (y)=13.8$

To calculate the RSE you divide the SE by the estimate of y ($500) and multiply by 100 to get a %

$\normalsize RSE (y)=\frac{13.8}{500} \times 100$

$\normalsize RSE (y)=2.8\%~$

To calculate the margin of error you multiply the SE by 1.96

$\text {Margin of Error} (y)=13.8 \times 1.96$

$\text {Margin of Error} (y)=27.05$

Confidentiality

A confidentiality process called perturbation is applied to the data in TableBuilder to avoid releasing information that may lead to the identification of individuals, families, households, dwellings or businesses. See Confidentiality in the TableBuilder user guide.

Data Item Lists

Download all (1.79 MB)

Data Item Lists

Data files

2022–23 NATSIHS Detailed Microdata Data Item List
Download xlsx [490.12 KB]
2018–19 NATSIHS Detailed Microdata Data Item List
Download xlsx [811.29 KB]
2018–19 NATSIHS TableBuilder Data Item List
Download xlsx [794.47 KB]

Previous releases

Details of microdata products available for previous NATSIHS releases
	TableBuilder data series	MicrodataDownload	DataLab
National Aboriginal and Torres Strait Islander Health Survey, 2018–19	TableBuilder		Detailed microdata
National Aboriginal and Torres Strait Islander Health Survey, Core Content – Risk Factors and Selected Health Conditions, 2012–13	TableBuilder		Detailed microdata
National Aboriginal and Torres Strait Islander Health Survey, Detailed Conditions and Other Health Data, 2012–13	TableBuilder		Detailed microdata
National Aboriginal and Torres Strait Islander Health Survey, Nutrition and Physical Activity, 2012–13	TableBuilder	Basic microdata	Detailed microdata
National Aboriginal and Torres Strait Islander Health Survey, 2004–05			Detailed microdata
National Health Indigenous, 2001			Detailed microdata

History of changes

Show all

Further information

See National Aboriginal and Torres Strait Islander Health Survey methodology, 2022–23 for further information about the 2022–23 NATSIHS.

See National Aboriginal and Torres Strait Islander Health Survey methodology, 2018–19 for further information about the 2018–19 NATSIHS.

See Australian Aboriginal and Torres Strait Islander Health Survey: Users' Guide, 2012–13 for further information about the 2012–13 NATSIHS.

APA

Citation

Microdata and TableBuilder: National Aboriginal and Torres Strait Islander Health Survey, Australia

APA

Citation

Introduction

Available products

File structure

2022–23 NATSIHS file structure

2018–19 NATSIHS file structure

Counts and weights

Weight variables

Using weights

File content

Available data items

Identifiers

Multi-response items

Continuous items

Reliability of estimates

Using replicate weights for estimating sampling error

How to use replicate weights

An example in calculating the SE for an estimate of the mean

Confidentiality

Data Item Lists

Data Item Lists

2022–23 NATSIHS Detailed Microdata Data Item List

2018–19 NATSIHS Detailed Microdata Data Item List

2018–19 NATSIHS TableBuilder Data Item List

Previous releases

History of changes

Show all

Further information

Provide feedback

Level 1	Level 2	Level 3	Level 4	Relationship type
Household				One record per in scope household
	Selected persons			Up to two selected person records per household for remote areas (1 adult and 1 child). Up to four selected person records per household for non-remote areas (2 adults and 2 children)
		Conditions		One Conditions record for each reported condition for each selected person record
		Medications		One Medications record for each reported medication/supplement for each selected person record
		Binge – type consumed		Up to 13 Binge – type records for each selected person aged 15 years and older
		Alcohol – day consumed		Up to three Alcohol – day consumed records per selected person aged 15 years and older
			Alcohol – type consumed	Up to 13 Alcohol – type records per Alcohol – day consumed record