Personal Income in Australia methodology

This is not the latest release View the latest release
Reference period
2014-15 to 2018-19
Released
17/12/2021

Personal Income in Australia provides regional data on the number of income earners, amounts received, and the distribution of income. 

The current release includes data for the five financial years between 2014-15 and 2018-19. It covers all persons who interacted with the Australian taxation system during this period and for whom personal income is identified. Information for earlier years back to 2011-12 can be found in previous releases of Personal Income in Australia. 

How data are collected

This release is compiled from the Linked Employer-Employee Dataset (LEED), which is built using Australian Tax Office (ATO) administrative data linked to ABS Business Longitudinal Analytical Data Environment (BLADE).

Scope and coverage of LEED

The LEED is a rich dataset that includes about 18 to 20 million job records each financial year since 2011-12 and contains over 150 million individual records over the period 2011-12 to 2018-19. 

The LEED covers all persons who either:

  • submitted an individual tax return (ITR); or
  • individuals who had a Pay As You Go (PAYG) payment summary issued by an employer and then remitted to the ATO.

Employees who did not submit a tax return and have not provided their Tax File Number to their employer will not appear in the LEED. Owner managers of unincorporated enterprises (OMUEs) who did not submit an ITR are also excluded.

Data sources

The LEED incorporates:

  • employer level data that include the ABS’s BLADE data and the ABS Business Register data supplied by the Registrar of Australian Business Register (ABR) to the ABS under A New Tax System (Australian Business Number) Act 1999 - which requires that such data is only used for the purpose of carrying out functions of the ABS; and 
  • person level ITR data, job level PAYG payment summary data and Client Register (CR) data supplied by the ATO to the ABS under the Taxation Administration Act 1953 - which requires that such data is only used for the purpose of administering the Census and Statistics Act 1905.

The data limitations or weaknesses outlined here are in the context of using the data for statistical purposes, and not related to the ability of the data to support the ATO's core operational requirements.

The ABS acknowledges the continuing support of the ATO in compiling these statistics. 

How data are processed

Integration method

LEED links jobs to employers; hence employed persons are linked to employers via the jobs they hold.

Before the linkage takes place, an input job level file is created largely based on the PAYG payment summary file. This file is also enhanced with job records derived using ITR information, to cover jobs without payment summary information, such as OMUE jobs. Data quality is enhanced by using occupation information from ITR, and the best available age, sex, and geographic information between the PAYG, ITR and CR data.      

Jobs are integrated with the employer by one of two methods. The method is dependent on which ABS Business register population the employer is grouped into.

  • Non-profiled population (businesses with a simple structure): a deterministic approach using the Australian Business Number (ABN).
  • Profiled population (businesses with a complex structure): a more detailed approach to linking is used, detailed below.

Where an employer is part of the profiled population, the relevant jobs are assigned to the type of activity units based on a logistic regression model developed using 2016 Census data. The model references independent variables common to both Census and personal income tax data, including sex, age, occupation, and region of usual residence. These are used to predict the industry of employment, which conceptually aligns to a type of activity unit. 

Where an employee has multiple job relationships with the same reporting ABN in an enterprise group, each job relationship is assigned to the same type of activity unit.

Based on the model, each job record is assigned a probability of being in any of the type of activity units present in the employing enterprise group. Iterative random assignment is undertaken using these probabilities until employment benchmarks are met. Benchmarks are based on Quarterly Business Indicators Survey (QBIS) data where a unit is in scope. Otherwise, BLADE employment levels are substituted where possible, otherwise no benchmarking is done.

The above process is applied to link the different input datasets for each financial year. Records have not been integrated across years and therefore, the LEED is a cross-sectional database and is not longitudinal.

ABS data integration practices comply with the High-Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes. For further information see - Keeping integrated data safe

Component datasets

The LEED consists of three cross-sectional files: person, job and employer. The LEED is not longitudinal and each file is for a single financial year.

The person file

The business file

The jobs file

Privacy and confidentiality

Legislative requirements ensuring the privacy and secrecy of these data have been adhered to. In accordance with the Census and Statistics Act 1905, results have been confidentialised to ensure they are not likely to enable identification of a particular person or organisation. All personal information is handled in accordance with the Australian Privacy Principles contained in the Privacy Act 1988.

All personal income tax statistics were analysed in de-identified form with no home address or date of birth included in LEED input files. Addresses were coded to the Australian Statistical Geography Standard and date of birth was converted to an age at 30 June of the reference year prior to data provision.

To minimise the risk of identifying individuals in aggregate statistics, perturbation has been applied. Perturbation involves small random adjustment of the statistics and is considered the most satisfactory technique for avoiding the release of identifiable statistics, while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics. Some cells have also been suppressed due to low counts.

How data are released

Personal Income in Australia is one of two LEED releases and provides aggregate data for Australia, states and territories, and over 2,200 regions, as classified in the Australian Statistical Geography Standard (ASGS), including at the Statstical Area 4 (SA4), Statistical Area 3 (SA3), Statistical Area 2 (SA2), Local Government Area (LGA) and Greater Capital City Statistical Area (GCCSA) levels.

The current release includes data for the five financial years between 2014-15 and 2018-19, while information for earlier years back to 2011-12 can be found in previous releases.

The other LEED release is Jobs in Australia, which also has detailed tables in the TableBuilder format in Microdata: Jobs in Australia. The TableBuilder product contains a broad range of data items covered in both Personal Income in Australia and Jobs in Australia. It is a rich source of information for data users interested in making customised analysis tables.

Data limitations

Personal Income in Australia is subject to the following sources of error:

  • Conceptual misalignment. The Australian tax system is purpose-built and complex, and in some cases it is difficult to determine how a particular income tax item should be used to describe income standards, and in some cases the item can be a partial conceptual match. While all care is taken, some income items are subject to this type of validity error. Coherence with other sources indicates that this has a low impact on aggregate series.
  • Measurement error. This is likely to be present in both person and employer information used. Most measurement error is unable to be determined or corrected; however, coherence with other similar statistics demonstrates that the error amount is small, and this has a low impact on aggregate series.
  • Incomplete information. Sometimes, Individual Tax Returns are not lodged, or not all items (e.g. occupation) are completed. The ABS advises caution when interpreting data subject to high rates of missing information.

Data Concepts

A summary of the major concepts presented in this release are as follows:

Personal income

All monetary values are presented as gross pre-tax dollars, wherever possible. This means they reflect income before deductions and losses, and before any taxation or levies (e.g. the Medicare levy or the temporary budget repair levy) are applied. The amounts shown are nominal and have not been adjusted for inflation.

Personal income is provided for the following five categories:

  • Employee income
  • Own unincorporated business income
  • Investment income
  • Superannuation income
  • Total income

Employee income

Own unincorporated business income

Investment income

Superannuation income

Other income

Total income

Counts of individuals

Individuals may receive income from several sources. Also, net income from a specific source may be positive or negative. For example, an individual may have positive income from Employee income yet negative net income from Investments. The number of individuals for each income source includes all persons with either positive or negative net income from that source.

The total number of individuals in receipt of income from at least one source cannot be calculated as the sum of the individuals in each income category, as people can have more than one source of income in any given year. For example, an individual could derive income from multiple sources such as Employee income, Investment income and income from their own unincorporated business and thus contribute to the regional person count in all three income categories.

Geography

All geographic variables are based on a person’s home address as reported on their ITR form. Addresses are coded to the Australian Statistical Geography Standard (ASGS).

In this release, the names and boundaries of all states and territories, GCCSAs, SA4s, SA3s, and SA2s are based on or concorded to the 2016 edition of ASGS: Main Structure and Greater Capital City Statistical Areas; those of all LGAs are based on or concorded to the 2018 edition of ASGS: Non ABS Structures.

If a geography variable is missing on the ITR, if possible it is imputed from the individual's most recent PAYG payment summary.

Details of income earners from regions unknown (not stated or indeterminate) or who are lodging returns from overseas are included in the totals shown. Persons living in Other territories are not published separately but included in the national totals. Therefore, the totals in each table may not necessarily be the sum of their components.

Gini coefficient

Simple measures of income distribution such as mean, median, percentile ratios and income shares can provide an indication of differences in the income distributions of two separate regions. However, none of the simple measures comprise a single statistic that summarises the whole income distribution in a way that directly considers the individual incomes of all regions. In this release, the Gini coefficient is used to compile a single statistic of inequality by summarising the distribution of income across the population in each region.

The Gini coefficient is provided here for Total income. This is a single statistic that lies between 0 and 1 and is a summary indicator of the degree of inequality in income between members of the tax form lodging population. Values closer to 1 represent greater inequality.

The Gini coefficients shown in this release can be regarded as indicative but not definitive. They should not be directly compared with other ABS published Gini coefficients. The Gini coefficients presented in this release are calculated from gross personal income and not from equivalised disposable income as presented in Household Income and Wealth, Australia. There is also an acknowledged under-coverage of certain income groups in taxation data due to tax exemptions, and people being under the tax free threshold. For instance, persons aged 60 years and over who are mostly dependent on superannuation income and those mostly reliant on government pensions and allowances may be missing from the tax data.

In addition, gross personal income can be reported by an individual as a negative amount. In a few regions, negative income is distorting the calculation of the Gini coefficients. For this reason, any Gini coefficients that have been calculated to be equal to or greater than 0.9 have been suppressed. 

Main source of income

The income source from which a person derives most of their (positive) income. For a stated income type, this measure reflects the proportion of all persons in a region for whom the income type is their main source of income.

If a region is particularly reliant on one source, it may be susceptible to policy or economic changes that affect that income type.

As there are several types of income, the main source may account for less than 50% of total income. Where persons receive the same amount across multiple income types, they have been excluded from the derivation of this indicator. Persons with negative or nil total income have also been excluded.

Non-lodgers

Non-lodgers are individuals who do not lodge a tax return. However, this population may have income that is in scope of this release. This can include persons who receive an income below certain levels or derive their income from some Commonwealth of Australia Government pension, benefit and allowance payments that are exempt from income tax. Their absence should be taken into consideration when interpreting these statistics.

Because the LEED contains information about jobs sourced from PAYG payment summaries, we are able to impute income information for non-lodgers who are employees.

Non-lodgers are included in the Employee income category, except where cross-classified with age and sex. Previously, age and sex were unavailable for non-lodgers. While this information is now available for non-lodgers from the LEED, they remain excluded from the following table to ensure consistency with previous results.

  • Table 4 Employee income by age and sex 2014-15 to 2018-19.

Non-lodgers are excluded from the Total income category in all instances.

Taxation and superannuation policy changes

Taxation and superannuation policy changes may impact on both scope of personal income covered by the LEED and the actual income amount.

Taxation threshold change

In 2012-13, the tax-free threshold was increased by $6,000 to $18,200. This appeared to result in less people needing to lodge a tax return, therefore less people with non-employee income being covered in LEED. 

First home super saver scheme (FHSS)

From 2017-18, people can make voluntary contributions into their super funds. The contributed amount and associated earnings can then be released from 2018-19 to eligible applicants to help the purchase of their first homes. This scheme is expected to lead to higher employee income (through the ‘reportable employer superannuation contributions’ component) since 2017-18, and higher reported superannuation income (through the ‘FHSS released amount’ component) since 2018-19.

The ABS encourage users of the data to research policy changes that may impact in the comparability of the data year to year. For more information on taxation policy change, the ATO publishes changes in their Taxation Statistics publications.

Comparison with other ABS sources

Comparison with ABS income data from the Survey of Income and Housing

Statistics in this release are produced using administrative data sourced from the Australian Taxation Office. The ABS also produces household income and wealth estimates collected directly from households via the Survey of Income and Housing (SIH).

The SIH collects information on sources of income, amounts received and the characteristics of persons aged 15 years and over in private dwellings throughout Australia. Since 2003-04, the SIH has been conducted every two years, with the most recent relevant snapshots being the 2011-12, 2013-14, 2015-16 and 2017-18 income years. Additional SIH estimates of annual income have been produced for the survey gap years up until 2014-15 using previous financial year information collected in each survey. For further information about the concepts, definitions, methodology and estimation procedures used in the SIH, please refer to Survey of Income and Housing, User Guide.

SIH employee income includes all payments received by individuals as a result of their current or former involvement in paid employment. In addition to the regular and recurring cash receipts captured by SIH, employee income also includes non-cash benefits, bonuses, termination payments and payments for irregular overtime. Details of the composition of employee income derived from ATO sources are provided in 'income variables' below.

Table 2 below presents a selection of reasonably comparable income data items, sourced from ATO and the SIH, for 2013-14, 2015-16 and 2017-18.

Table 2 - Selected sources of income, PIiA and SIH data, 2013-14, 2015-16 and 2017-18
 PIiA 2013-14 $bSIH 2013-14 $bPIiA 2015-16 $bSIH 2015-16 $bPIiA 2017-18 $bSIH 2017-18 $b
Employee income648.8679.4724.9729.0787.3781.6
Own unincorporated business income45.347.750.643.853.953.0
Investment income79.573.281.656.488.987.2
Superannuation income10.731.411.741.211.846.1

Differences in collection methodologies, data collection/extraction periods, definitions, scope/coverage etc., can all contribute to variations between PIiA and SIH income data. Also, as mentioned before, SIH presents data for low income households whereas the PIiA series may be missing some individuals with low incomes (for example those earning under the $18,200 tax free threshold) because they may not need to lodge tax returns. 

Since changes were applied to the reporting of superannuation income in 2007, the SIH estimate is thought to provide a more accurate, complete indication of the level of income derived from Superannuation. However, the SIH estimates only include superannuation pension streams and not superannuation lump sum payments.

Differences to other labour statistics

Labour Account Australia provides quarterly and annual time series data, consisting of four quadrants: Jobs, Persons, Hours and Payments. The estimates are at the national level. Statistics in Labour Account Australia are sourced from business and household surveys, ABS business register information, defence force information, child workers information and estimates from the ABS Labour Force Survey for contributing family workers.

The Labour Account Payments quadrant accounts for the costs incurred by enterprises in employing labour and the incomes received by people from their labour provision. It can be described as the cost of labour, and reflects the interactions between jobs, people and labour volume (hours worked). The payments data are primarily sourced from the Australian National Accounts.

For more information on the range of different data sources, see ABS Labour Statistics: A broad range of information.

Glossary

Show all

Abbreviations

Show all

History of changes

This release was previously known as Estimates of Personal Income for Small Areas, and later renamed and restructured to Personal Income in Australia.

Data from 2010-11 is not sourced from the LEED and is available in previous editions of Estimates of Personal Income for Small Areas. Information for the period between 2011-12 and 2013-14 can be found in previous releases of Personal Income in Australia. 

Data for previous years have not been revised.

Back to top of the page