Personal Income in Australia methodology

Latest release
Reference period
2020-21 financial year

Personal Income in Australia provides regional data on the number of income earners, amounts received, and the distribution of income. 

The current release includes data for the five financial years between 2016-17 and 2020-21. It covers all people who interacted with the Australian taxation system during this period and for whom personal income is identified.  

How data are collected

This release is compiled from the Linked Employer-Employee Dataset (LEED), which is built using Australian Taxation Office (ATO) administrative data linked to ABS Business Longitudinal Analytical Data Environment (BLADE) data.

Scope and coverage of LEED

The LEED is a rich dataset that includes approximately 18 to 21 million job records each financial year since 2011-12 and contains over 190 million individual records over the period 2011-12 to 2020-21. 

The LEED covers all persons who either:

  • submitted an individual tax return (ITR); or
  • individuals who had an Income Statement (previously Pay As You Go (PAYG) payment summary) issued by an employer and then remitted to the ATO.

Employees who did not submit a tax return and have not provided their Tax File Number to their employer will not appear in the LEED. Owner managers of unincorporated enterprises (OMUEs) who did not submit an ITR are also excluded.

Data sources

The LEED incorporates:

  • person level ITR data, job level Income Statement data and Client Register (CR) data supplied by the ATO to the ABS under the Taxation Administration Act 1953 - which requires that such data is only used for the purpose of administering the Census and Statistics Act 1905; and
  • employer level data that include the ABS’s BLADE data and the ABS Business Register data supplied by the Registrar of Australian Business Register (ABR) to the ABS under A New Tax System (Australian Business Number) Act 1999 - which requires that such data is only used for the purpose of carrying out functions of the ABS.

The data limitations or weaknesses outlined here are in the context of using the data for statistical purposes, and not related to the ability of the data to support the ATO's core operational requirements.

The ABS acknowledges the continuing support of the ATO in compiling these statistics. 

Data on Migrants

The migrant data used in LEED are sourced from the Person Level Integrated Dataset (PLIDA), formerly known as the Multi-Agency Data Integration Project (MADIP).

The migrant data in PLIDA are a suite of administrative datasets from the Department of Home Affairs. They include the visa grants file, the permanent settlements database and temporary visa holders file. 

The scope of the migrant data in this release includes:

  • Permanent migrants with an arrival date between 1 January 2000 and 30 June 2021;
  • Permanent migrants with an unknown arrival date and a visa granted between 1 January 2000 and 30 June 2021; and 
  • Temporary visa holders with a visa granted between 1 January 2000 and 30 June 2021.

This includes permanent migrants that have become Australian citizens during this period.

Migrant data presented in this release are not comparable with those published in the previous release of Personal Income in Australia due to improved scoping and visa selection methods. An new visa selection method has been applied in this release which selects the substantive permanent visa held prior to a resident return visa (subclasses 111, 151, 154-159, 834, R and K38). In instances where no previous substantive permanent visa (skilled, family or humanitarian) can be found, resident return visa holders are grouped into the other permanent migrants category. 

How data are processed

Integration method

LEED links jobs to employers and employed persons are linked to employers via the jobs they hold.

Before the linkage takes place, an input job level file is created largely based on the income statement file. This file is also enhanced with job records derived using ITR information, to cover jobs without Income Statement information, such as OMUE jobs. Data quality is enhanced by using occupation information from ITR, and the best available age, sex, and geographic information between the Income Statement, ITR and Client Register (CR) data.

Jobs are integrated with the employer by one of two methods. The method is dependent on which part of the business population on the ABS Business Register the employer is grouped into.

  • Non-profiled population (businesses with a simple structure): a deterministic approach using the Australian Business Number (ABN).
  • Profiled population (businesses with a complex structure): a more detailed approach to linking is used, detailed below.

Where an employer is part of the profiled population, the relevant jobs are assigned to type of activity units (TAUs) based on a logistic regression model developed using Census data. The model references independent variables common to both Census and personal income tax data, including sex, age, occupation, and region of usual residence. These are used to predict the industry of employment, which conceptually aligns to a type of activity unit. 

Where an employee has multiple job relationships with the same reporting ABN in an enterprise group, each job relationship is assigned to the same type of activity unit.

Based on the model, each job record is assigned a probability of being in each of the type of activity units present in the employing enterprise group. Iterative random assignment is undertaken using these probabilities until employment benchmarks are met. Benchmarks are based on Quarterly Business Indicators Survey (QBIS) data where a unit is in scope. BLADE employment levels are substituted where QBIS data is not available, otherwise no benchmarking is done.

The above process is applied to link the different input datasets for each financial year. Records have not been integrated across years and therefore, the LEED is a cross-sectional database and is not longitudinal.

ABS data integration practices comply with the High-Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes. For further information see - Keeping integrated data safe

Integrating data for migrants

From 2022, migrant data were added to the LEED. Personal identifiers were used to first integrate the migrant data with the ATO's Client Register, and then subsequently integrated into the LEED. This enables more detailed analysis of labour market and fiscal contributions of migrants to the economy, allowing policy makers and researchers to better understand the migrant experience and their economic contribution to Australia. 

Component datasets

The LEED consists of three cross-sectional files: person, job and employer. The LEED is not longitudinal and each file is for a single financial year.

Person file

Jobs file

Employer file

Privacy and confidentiality

Legislative requirements ensuring the privacy and secrecy of these data have been adhered to. In accordance with the Census and Statistics Act 1905, results have been confidentialised to ensure they are not likely to enable identification of a particular person or organisation. All personal information is handled in accordance with the Australian Privacy Principles contained in the Privacy Act 1988.

All personal income tax statistics were analysed in de-identified form with no home address or date of birth included in LEED input files. Addresses were coded to the Australian Statistical Geography Standard and date of birth was converted to an age at 30 June of the reference year prior to data provision.

To minimise the risk of identifying individuals in aggregate statistics, perturbation has been applied. Perturbation involves small random adjustment of the statistics and is considered the most satisfactory technique for avoiding the release of identifiable statistics, while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics. Some cells have also been suppressed due to low counts.

How data are released

Personal Income in Australia is one of two LEED releases, along with Jobs in Australia, and provides aggregate data for Australia, states and territories and nearly 2,500 regions as classified in the Australian Statistical Geography Standard (ASGS), including at the Statistical Area 4 (SA4), Statistical Area 3 (SA3), Statistical Area 2 (SA2), Local Government Area (LGA) and Greater Capital City Statistical Area (GCCSA) levels.

The current release includes data for the five financial years between 2016-17 and 2020-21. Summary statistics for the full time series from 2011-12 are presented in Table 8 in the Data download tab.

Data from LEED are also available in TableBuilder format Jobs and Income of Employed Persons. The TableBuilder product contains a broad range of data items covered in both Jobs in Australia and Personal Income in Australia. It is a rich source of information for data users interested in making customised analysis tables.

Differences between Jobs in Australia and Personal Income in Australia

Jobs in Australia (JIA) and Personal Income in Australia (PIA) present similar data on earners and income from the Linked Employee-Employer Dataset (LEED). However, there are a few small but important differences between JIA and PIA that should be taken into consideration when comparing them. 

The number of earners will be different. In PIA, anyone who earns income, whether from employment, superannuation, investment etc. is counted as an earner. This also includes individuals who only receive an employment termination payment without any regular income. In JIA, earners are restricted to those who receive payment from employment, which is either working as an employee (including as an owner manager of incorporated enterprise) or an owner-manager of unincorporated enterprise. JIA does not include people who only receive an employment termination payment. 

The median incomes reported in JIA are reported on a 'per job' and 'employed person' basis. However, people may work more than one job, either at the same time or throughout the financial year. For PIA, the income is reported on a 'per person' basis which includes all income types, not only employment income, received in that financial year.

Data limitations

Personal Income in Australia is subject to the following sources of error:

  • Conceptual misalignment. The Australian tax system is purpose-built and complex, and in some cases it is difficult to determine how a particular income tax item should be used to describe income standards, and in some cases the item can be a partial conceptual match. While all care is taken, some income items are subject to this type of validity error. Coherence with other sources indicates that this has a low impact on aggregate series.
  • Measurement error. This is likely to be present in both person and employer information used. Most measurement error is unable to be determined or corrected; however, coherence with other similar statistics demonstrates that the error amount is small, and this has a low impact on aggregate series.
  • Incomplete information. Sometimes, Individual Tax Returns are not lodged, or not all items (e.g. occupation) are completed. The ABS advises caution when interpreting data subject to high rates of missing information.

Data Concepts

A summary of the major concepts presented in this release are as follows:

Personal income

All monetary values are presented as gross pre-tax dollars, wherever possible. This means they reflect income before deductions and losses, and before any taxation or levies (e.g. the Medicare levy or the temporary budget repair levy) are applied. The amounts shown are nominal and have not been adjusted for inflation.

Personal income is provided for the following five categories:

  • Employee income
  • Own unincorporated business income
  • Investment income
  • Superannuation income
  • Total income

Employee income

Own unincorporated business income

Investment income

Superannuation income

Other income

Total income

Counts of individuals

Individuals may receive income from several sources. Also, net income from a specific source may be positive or negative. For example, an individual may have positive income from Employee income yet negative net income from Investments. The number of individuals for each income source includes all persons with either positive or negative net income from that source.

The total number of individuals in receipt of income from at least one source cannot be calculated as the sum of the individuals in each income category, as people can have more than one source of income in any given year. For example, an individual could derive income from multiple sources such as Employee income, Investment income and income from their own unincorporated business and thus contribute to the regional person count in all three income categories.

Geography

All geographic variables are based on a person’s home address as reported on their ITR form. Addresses are coded to the Australian Statistical Geography Standard (ASGS).

In this release, the names and boundaries of all states and territories, GCCSAs, SA4s, SA3s, and SA2s are based on or concorded to the 2021 edition of ASGS: Main Structure and Greater Capital City Statistical Areas; those of all LGAs are based on or concorded to the 2020 edition of ASGS: Non ABS Structures.

If a geography variable is missing on the ITR, if possible it is imputed from the individual's most recent Income Statement.

Details of income earners from regions unknown (not stated or indeterminate) or who are lodging returns from overseas are included in the totals shown. Persons living in Other territories are not published separately but included in the national totals. Therefore, the totals in each table may not necessarily be the sum of their components.

Gini coefficient

Simple measures of income distribution such as mean, median, percentile ratios and income shares can provide an indication of differences in the income distributions of two separate regions. However, none of the simple measures comprise a single statistic that summarises the whole income distribution in a way that directly considers the individual incomes of all regions. In this release, the Gini coefficient is used to compile a single statistic of inequality by summarising the distribution of income across the population in each region.

The Gini coefficient is provided here for Total income. This is a single statistic that usually lies between 0 and 1 and is a summary indicator of the degree of inequality in income between members of the tax form lodging population within a region. A value of 0 indicates that all earners reported the same amount of income in that region. Higher values represent relatively higher levels of income inequality. The income data reported in this release is market income and can be negative. This is mainly due to losses for Owner Managers of Unincorporated Enterprises (OMUEs). Therefore, for areas with large numbers of OMUEs reporting negative incomes, the Gini coefficients can exceed 1. 

The Gini coefficients shown in this release can be regarded as indicative but not definitive. They should not be directly compared with other ABS published Gini coefficients. The Gini coefficients presented in this release are calculated from gross personal income and not from equivalised disposable income as presented in Household Income and Wealth, Australia. There is also an acknowledged under-coverage of certain income groups in taxation data due to tax exemptions, and people being under the tax-free threshold. For instance, persons aged 60 years and over who are mostly dependent on superannuation income and those mostly reliant on government pensions and allowances may be missing from the tax data.

Main source of income

The income source from which a person derives most of their (positive) income. For a stated income type, this measure reflects the proportion of all persons in a region for whom the income type is their main source of income.

If a region is particularly reliant on one source, it may be susceptible to policy or economic changes that affect that income type.

As there are several types of income, the main source may account for less than 50% of total income. Where persons receive the same amount across multiple income types, they have been excluded from the derivation of this indicator. Persons with negative or nil total income have also been excluded.

Non-lodgers

Non-lodgers are individuals who do not lodge a tax return. However, this population may have income that is in scope of this release. This can include persons who receive an income below certain levels or derive their income from some Commonwealth of Australia Government pension, benefit and allowance payments that are exempt from income tax. Their absence should be taken into consideration when interpreting these statistics.

Because the LEED contains information about jobs sourced from PAYG payment summaries, we are able to impute income information for non-lodgers who are employees.

Non-lodgers are included in the Employee income category, except where cross-classified with age and sex. Previously, age and sex were unavailable for non-lodgers. While this information is now available for non-lodgers from the LEED, they remain excluded from the following table to ensure consistency with previous results.

  • Table 4 Employee income by age and sex 2016-17 to 2020-21.

Non-lodgers are excluded from the Total income category in all instances.

Taxation and superannuation policy changes

Taxation and superannuation policy changes may impact on both scope of personal income covered by the LEED and the actual income amount.

Taxation threshold change

In 2012-13, the tax-free threshold was increased by $6,000 to $18,200. This appeared to result in less people needing to lodge a tax return, therefore less people with non-employee income being covered in LEED. 

First home super saver scheme (FHSS)

From 2017-18, people can make voluntary contributions into their super funds. The contributed amount and associated earnings can then be released from 2018-19 to eligible applicants to help the purchase of their first homes. This scheme is expected to lead to higher employee income (through the ‘reportable employer superannuation contributions’ component) since 2017-18, and higher reported superannuation income (through the ‘FHSS released amount’ component) since 2018-19.

The ABS encourage users of the data to research policy changes that may impact in the comparability of the data year to year. For more information on taxation policy change, the ATO publishes changes in their Taxation Statistics publications.

Comparison with the Survey of Income and Housing

Statistics in this release are produced using administrative data sourced from the Australian Taxation Office. The ABS also produces household income and wealth estimates collected directly from households via the Survey of Income and Housing (SIH).

The SIH collects information on sources of income, amounts received and the characteristics of persons aged 15 years and over in private dwellings throughout Australia. Since 2003-04, the SIH has been conducted every two years, with the most recent relevant snapshots being the 2015-16, 2017-18 and 2019-20 income years.  For further information about the concepts, definitions, methodology and estimation procedures used in the SIH, please refer to Survey of Income and Housing, User Guide.

SIH employee income includes all payments received by individuals as a result of their current or former involvement in paid employment. In addition to the regular and recurring cash receipts captured by SIH, employee income also includes non-cash benefits, bonuses, termination payments and payments for irregular overtime. 

The table below presents a selection of reasonably comparable income data items, sourced from ATO and the SIH, for 2015-16, 2017-18 and 2019-20.

Selected sources of income, PIiA and SIH data, 2015-16, 2017-18 and 2019-20
 PIiA 2015-16 $bSIH 2015-16 $bPIiA 2017-18 $bSIH 2017-18 $bPIiA 2019-20 $bSIH 2019-20 $b
Employee income724.9729.0787.3781.6865.0879.6
Own unincorporated business income50.643.853.953.052.342.9
Investment income81.656.488.987.292.977.2
Superannuation income11.741.211.846.112.052.8

Differences in collection methodologies, data collection/extraction periods, definitions, scope/coverage etc., can all contribute to variations between PIiA and SIH income data. Also, as mentioned before, SIH presents data for low income households whereas the PIiA series may be missing some individuals with low incomes (for example those earning under the $18,200 tax free threshold) because they may not need to lodge tax returns. 

Since changes were applied to the reporting of superannuation income in 2007, the SIH estimate is thought to provide a more accurate, complete indication of the level of income derived from Superannuation. However, the SIH estimates only include superannuation pension streams and not superannuation lump sum payments.

Glossary

Show all

Abbreviations

Show all

Back to top of the page