Microdata and TableBuilder: Education and Work, Australia

Enables analysis of participation of current or recent study, educational attainment, and employment.

Accessing the data

The Survey of Education and Work (SEW) provides annual statistics about the educational activity and qualifications of the population. It is conducted in May each year throughout Australia as a supplement to the monthly Labour Force Survey (LFS). See Education and Work, Australia for summary results, methodology and other information.

The data can be accessed using the following microdata products: 

  • TableBuilder - produce your own tables and graphs. TableBuilder is available for the following survey years: all years from 2011 to 2024.
  • DataLab - detailed microdata is available in DataLab for the following survey years: 2016, 2017, 2022, 2023 and 2024.
  • Basic microdata - confidentialised unit record files provide basic microdata for the following survey years: 2001, 2003, 2005, 2007, 2009 and 2011.

Compare data services to see what's right for you or Frequently asked questions.

Data and file structure

Data items include:  

  • Demographics, such as age, sex and country of birth
  • Geography
  • Labour force characteristics
  • Highest educational attainment
  • Education in current year
  • Education in previous year
  • Time left study
  • Apprenticeships/traineeships in current year
  • Unmet demand for apprenticeships/traineeships
  • Population data items

Refer to data item lists for each microdata product for detailed information on items available. 

The SEW is structured as a single level person file.

Using TableBuilder

Please refer to relevant sections from TableBuilder main page for information about how to create basic tables, custom groups, graphs and large tables. 

Weights

When tabulating data in TableBuilder, person weights are automatically applied to the underlying sample counts. Weighting is the process of adjusting results from a sample survey to infer results for the total population. To do this, a 'weight' is allocated to each sample unit. The weight is the value that indicates how many population units are represented by the sample unit.

Not applicable categories

Most data items included in the TableBuilder file include a 'Not applicable' category. The classification values of these 'Not applicable' categories, where relevant, are shown in the TableBuilder data item list. The 'Not applicable' category generally represents the number of people who were not asked a particular question or the number of people excluded from the population for a data item when that data was derived (e.g. Skill level of current job is not applicable for people who are not employed).

Table populations

The population relevant to each data item is identified in the data item list and should be kept in mind when extracting and analysing data. The actual population estimate for each data item is equal to the total cumulative frequency minus the 'Not applicable' category.

In addition, the SEW TableBuilder includes 'Population data items' that can be used in a table to 'filter' for a specific population of interest. For example, the population data item 'P04. Persons with a non-school qualification' can be used to filter for this population.

The purpose of the population data item 'P16. Non-Indigenous flag' is to assist users in producing non-indigenous data only. It should not be used to estimate for the Indigenous population through differencing, as the scope of the SEW excludes persons living in Indigenous communities.

Continuous data items

The SEW TableBuilder includes several continuous variables: 

  • They can have a response value at any point along a continuum.
  • Some continuous data items are allocated special codes for certain responses (e.g. 000 = 'Not applicable').
  • When creating ranges in TableBuilder for such continuous items, special codes will automatically be excluded. Therefore the total will show only 'valid responses' rather than all responses (including special codes). These codes are shown in the data item list.
  • Continuous items with special codes have a corresponding categorical item on the Person level that provides the ability to display data for the special code. Refer to the data item list.

Confidentiality

A confidentiality process called perturbation is applied to the data in TableBuilder to avoid releasing information that may lead to the identification of individuals, families, households, dwellings or businesses. See Confidentiality and relative standard error.

Using DataLab

The DataLab environment allows real time access to detailed microdata from the Survey of Education and Work.
The DataLab is an interactive data analysis solution available for users to run advanced statistical analyses, for example, multiple regressions and structural equation modelling. The DataLab environment contains recent versions of analytical software, including R, SAS, Stata and Python. Controls in the DataLab have been put in place to protect the identification of individuals and organisations. All output from DataLab sessions is cleared by an ABS officer before it is released.

For information about all of the data items available in the DataLab please see the DataLab microdata data item lists.

For more information, including prerequisites for DataLab access, please see the DataLab page.

Reliability of estimates

As the survey was conducted on a sample of households in Australia, it is important to take account of the method of sample selection when deriving estimates from the detailed microdata. This is important as a person's chance of selection in the survey varied depending on the state or territory in which the person lived. If these chances of selection are not accounted for by use of appropriate weights, the results could be biased. 
Each person record has a main weight (FINPRSWT). This weight indicates how many population units are represented by the sample unit. When producing estimates of sub-populations from the detailed microdata, it is essential that they are calculated by adding the weights of persons in each category and not just by counting the sample number in each category. If each person’s weight were to be ignored when analysing the data to draw inferences about the population, then no account would be taken of a person's chance of selection or of different response rates across population groups, with the result that the estimates produced could be biased. The application of weights ensures that estimates will conform to an independently estimated distribution of the population by age, by sex, etc. rather than to the distributions within the sample itself.

It is also important to calculate a measure of sampling error for each estimate.  Sampling error occurs because only part of the population is surveyed to represent the whole population.  Sampling error should be considered when interpreting estimates as this gives an indication of accuracy and reflects the importance that can be placed on interpretations using the estimate. Measures of sampling error include standard error (SE), relative standard error (RSE) and margin of errors (MoE).  These measures of sampling error can be estimated using the replicate weights. The replicate weight variables provided on the microdata are labelled FSRWTXX, where XX represents the number of the given replicate group. The exact number of replicates will vary depending on the survey but will generally be 30, 60 or 200 replicate groups. As an example, for survey microdata with 30 replicate groups, you will find 30 person replicate weight variables labelled FSRWT01 to FSRWT30. 

Using Replicate Weights for Estimating Sampling Error

Overview of replication methods

ABS household surveys employ complex sample designs and weighting which require special methods for estimating the variance of survey statistics.  Variance estimators for a simple random sample are not appropriate for this survey microdata. 

A class of techniques called 'replication methods' provide a general process for estimating variance for the types of complex sample designs and weighting procedures employed in ABS household surveys. The ABS uses a method called the Group Jackknife Replication Method.  

A basic idea behind the replication approach is to split the sample into G replicate groups. One replicate group is then dropped from the file and a new set of weights is produced for the remaining sample. This is repeated for all G replicate groups to provide G sets of replicate weights. For each set of replicate weights, the statistic of interest is recalculated and the variance of the full sample statistic is estimated using the variability among the replicate statistics. The statistics calculated from these replicates are called replicate estimates. Replicate weights provided on the microdata file enable variance of survey statistics, such as means and medians, to be calculated relatively simply (Further technical explanation can be found in Section 4 of Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee).

How to use replicate weights

To calculate the standard error of any statistic derived from the survey data, the method is as follows:
1.    Calculate the estimate of the statistic of interest using the main weight.
2.    Repeat the calculation above for each replicate weight, substituting the replicate weight for the main weight and creating G replicate estimates.  In the example where there are 30 replicate weights, you will have 30 replicate estimates.  
3.    Use the outputs from step 1 and 2 as inputs to the formula below to calculate the estimate of the Standard Error (SE) for the statistic of interest.

\(\normalsize SE (y)=\sqrt{\frac{G-1}{G} \sum_{g=1}^{G}(y_{(g)}-y)^{2}}\)

[Equation 1]

  • G = Number of replicate groups
  • g = the replicate group number
  • \(y_{(g)}\) = Replicate estimate for group g, i.e. the estimate of y calculated using the replicate weight for g
  • y = the weighted estimate of y from the sample

From the replicate variance you can then derive the following measures of sampling error: relative standard error (RSE), or margin of error (MOE) of the estimate.

\(Relative\;Standard\; Error\; \normalsize (RSE) = \frac{{SE}}{{Estimate}}\)

[Equation 2]

\(Margin\;of\;Error\; (MoE) = 1.96 \;\times\; SE\)

[Equation 3]

 An example in calculating the SE for an estimate of the mean 

Suppose you are calculating the mean value of earnings, y, in a sample.  Using the main weight produces an estimate of $500. 
You have 5 sets of Group Jackknife replicate weights and using these weights (instead of the main weight) you calculate 5 replicate estimates of $510, $490, $505, $503, $498 respectively.  

To calculate the standard error of the estimate you will substitute the following inputs to equation [1]

G = 5

\(y\)= 500

g=1, \(y_{(g)}\)= 510

g=2, \(y_{(g)}\) = 490

\(\normalsize SE (y)=\sqrt{\frac{5-1}{5} \sum_{g=1}^{5}(y_{(g)}-500)^{2}}\)

\(\normalsize SE (y)=\sqrt{\frac{4}{5} ((510-500)^{2} + (490-500)^{2} + (505-500)^{2} + (503-500)^{2} + (498-500)^{2}}\)

\(\normalsize SE (y)=\sqrt{\frac{4}{5} \times 238}\)

\(\normalsize SE (y)=13.8 \)

To calculate the RSE you divide the SE by the estimate of y ($500) and multiply by 100 to get a %

  • \(\normalsize RSE(y) = {\frac{13.8}{500}\times100}\)
  • \(\normalsize RSE(y) = 2.8\%\)


To calculate the margin of error you multiple the SE by 1.96

  • \(\normalsize Margin\;of \;Error (y)=13.8 \times 1.96\)
  • \(\normalsize Margin\;of\;Error (y)=27.05\)

Data downloads

Data files

Previous releases

 TableBuilder data seriesMicrodataDownloadDataLab
Education and Work, biennially 2001-2011 Basic microdata 

Methodology

Show all

Post release changes

Show all

Previous catalogue number

This release previously used catalogue number 6227.0.30.001.

Back to top of the page