Linked Employer-Employee Dataset (LEED)
The Linked Employer Employee Dataset (LEED) is a cross-sectional database which is built using Australian Tax Office (ATO) administrative data linked to ABS Business Longitudinal Analytical Data Environment (BLADE).
The LEED enables simultaneous analysis of met supply and demand in the Australian labour market, through:
- providing supplementary labour statistics and facilitates labour market research at industry and regional levels.
- enabling analysis of the Australian labour market at macro and micro levels;
- enabling analysis of how specific events impact employees and employers;
- helping to understand structural changes in the labour market.
The LEED consists of three cross-sectional files:
- a person file;
- a jobs file; and
- an employer file.
The LEED associates information about a person with information about their employing business. This is done by establishing the existence of a job. An employed person can have one or more jobs throughout the year with one or more employers, some of which may be held concurrently with others. A job can be created either by an employing business or the personal enterprise of the individual (an owner manager).
LEED overview
Scope
The LEED contains information for all persons who interacted with the Australian taxation system with reference to financial years after 2011-12. The LEED includes data for all persons who either:
- submitted an individual tax return (ITR); or
- individuals who had a Pay As you Go (PAYG) payment summary issued by an employer and then remitted to the ATO.
Employees who did not submit a tax return and have not provided their Tax File Number to their employer will not appear in the LEED. Owner managers of unincorporated enterprises (OMUEs) who did not submit an ITR are also excluded.
The LEED includes all employers present on the BLADE who have at least one employee linked to them. Some small businesses are excluded from the BLADE (e.g. those that do not meet the turnover threshold at which they must register for Goods and Services Tax) and do not appear on the LEED. Synthetic records are created for these records where they are both unincorporated and owned by an Owner Manager of an Unincorporated Enterprise present on the LEED.
The LEED includes all sources of income, regardless of whether the income provider resides within Australia's economic territory.
Integration methodology
Initial data cleaning is undertaken to remove duplicate and erroneous records. In particular, job records are repaired to minimise the impact of administrative noise on output statistics, such as annual payment summaries issued in two separate parts.
Before the linkage takes place, an input job level file is created largely based on the PAYG payment summary file. This file is also enhanced with job records derived using ITR information, to cover jobs without payment summary information, such as OMUE jobs. Data quality is enhanced by using occupation information from ITR, and the best available age, sex, and geographic information between the PAYG, ITR and CR data.
Jobs are integrated with the employer by one of two methods. The method is dependent on which ABS Business register population the employer is grouped into.
Non-profiled population (businesses with a simple structure): a deterministic approach using the Australian Business Number (ABN).
Profiled population (businesses with a complex structure): a more detailed approach to linking is used, detailed below.
Where an employer is part of the profiled population, the relevant jobs are assigned to the type of activity units based on a logistic regression model developed using 2016 Census data. The model references independent variables common to both Census and personal income tax data, including sex, age, occupation, and region of usual residence. These are used to predict the industry of employment, which conceptually aligns to a type of activity unit.
Where an employee has multiple job relationships with the same reporting ABN in an enterprise group, each job relationship is assigned to the same type of activity unit.
Based on the model, each job record is assigned a probability of being in any of the type of activity units present in the employing enterprise group. Iterative random assignment is undertaken using these probabilities until employment benchmarks are met. Benchmarks are based on Quarterly Business Indicators Survey (QBIS) data where a unit is in scope. Otherwise, BLADE employment levels are substituted where possible, otherwise no benchmarking is done.
The above process is applied to link the different input datasets for each financial year. Records have not been integrated across years and therefore, the LEED is a cross-sectional database and is not longitudinal.
Legislative environment
The LEED incorporates:
- person level ITR data, job level PAYG payment summary data and Client Register data supplied by the ATO to the ABS under the Taxation Administration Act 1953 - which requires that such data is only used for the purpose of administering the Census and Statistics Act 1905; and
- employer level data that include the ABS's BLADE data and the ABS Business Register data supplied by the Registrar of Australian Business Register (ABR) to the ABS under A New Tax System (Australian Business Number) Act 1999 - which requires that such data is only used for the purpose of carrying out the functions of the ABS.
The data limitations or weakness outlined here are in the context of using the data for statistical purposes, and not related to the ability of the data to support the ATO's or ABR's core operational requirements.
Legislative requirements to ensure privacy and secrecy of these data have been followed. In accordance with the Census and Statistics Act 1905, results have been confidentialised to ensure they are not likely to enable identification of a particular person or organisation. All personal information is handled in accordance with the Australian Privacy Principles contained in the Privacy Act 1988.
ABS data integration practices comply with the High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes.
The LEED is comprised of a person file, a job file and an employer file
Image
Description
Person file
Each person file contains data for all persons who either submitted an Individual Tax Return (ITR) or who were identifiable on a payment summary in the reference year. Each record includes de-identified demographic and geographic data, and aggregate income information.
Employed persons may be either employees (including Owner Manager of Incorporated Enterprises or OMIEs), Owner Managers of Unincorporated Enterprises (OMUEs), or both. Employees are identified by the presence of aggregate employee income and at least one linked employee job.
Employees who have not submitted an ITR but who have provided their Tax File Number to their employer are imputed from Pay As You Go payment summary data.
OMUEs are identified by the presence of any of the own unincorporated business income types and a linked OMUE job.
Tax lodgers who are not employees or owner managers (such as persons with only investment incomes) are included on the person file to support statistical analysis that requires a more complete view of the tax lodger population.
Jobs file
The jobs file is a complete list of the job relationships held at any time during the reference year. It is constructed primarily from Pay As You Go (PAYG) payment summary data. PAYG payment summaries describe the payments made to an individual by an employer within a financial year. Conceptually, payment summary data should include most employee/employer job relationships. OMUE jobs are derived from ITR data and are added to the jobs file, some of these link to businesses in the Business Longitudinal Analysis Data Environment (BLADE).
The LEED jobs file does not capture voluntary jobs and unpaid contributing family worker jobs.
In some cases a synthetic employee job record has been created based on information in the person file. This has occurred when a person has recorded wage or salary information that cannot be identified in payment summary data. Sometimes, an employee job may not be able to be linked to an employing organisation due to reporting errors or missing information.
A person can hold several jobs during the year, either concurrently (as a multiple job-holder) or consecutively. For a person who is an employee of several employers, each relationship is listed as a separate job. Due to data limitations, only one self-employment job can be recorded for any OMUE even if a person owns and manage more than one enterprise. An OMUE can hold other jobs as an employee.
Data on multiple job holders can also be found in the Labour Account Australia, however there are a number of differences between the two sources.
PAYG payment summary start and end dates are used to
- determine the start and end of a job relationship,
- identify concurrent job-holding, and
- determine the duration of the job.
These dates are known to have high measurement error rates, which are likely to inflate job and concurrent job counts. Some of this error may be due to misinterpretation and recording errors, but it is also expected that payroll system and report design have an influence.
Some treatments have been applied to address over counts of jobs or concurrent job-holding, including:
- In cases where a person has received several PAYG payment summaries from the same employer, and the time between the end of the first payment summary and the start of the next payment summary is 31 days or less, this is counted as a single job.
- In cases where a person has received several PAYG payment summaries from different employers, they are only considered to be concurrent if they overlap by more than 31 days.
- In cases where a person has more than 10 jobs, those within the same industry sub-division (2-digit ANZSIC industry) are counted as a single job in the 2011-12 to 2016-17 data. From 2017-18 reference year, a lower level of industry classification - those within the same industry class (4-digit ANZSIC industry), was used to collapse jobs. This change has improved data quality but has brought in a negligible increase to the number of total jobs reported compared with reported numbers in the old approach.
These treatments are aimed at minimising the impact of administrative errors while also reflecting a reasonably accurate view of differing job structures.
Employer file
In the LEED, an employer is identified when a job has been linked to any legal entity in the non-profiled population or any type of activity unit in the profiled population.
The employer file contains business units present in BLADE that could be linked to a job, as well as unincorporated entities. Some unincorporated entities are identified in personal income tax data and are not otherwise included in BLADE or cannot be identified in BLADE. Industry and several other employer variables are not available for these unincorporated entities.
LEED outputs
Key outputs
The LEED provides cross-sectional information relating to employees and owner managers of unincorporated enterprises
Key data/series include:
- Employed persons and their jobs (employees and owner managers of unincorporated enterprises)
- Multiple job holders
- Income at job and person levels
- Regional spotlights of jobs and employed persons
Other data includes (but is not limited to):
For people with income:
- Income types: Total, Employee, Investment, Own unincorporated business, Superannuation
- Counts of earners
- Distributional information: mean, median, quartiles, percentile ratios, gini coefficient, income share
- Geography - region of residence (at State and Territory, Local Government Area, Statistical Area 4, 3, and 2 levels)
- Demographic information: age, sex
In addition, for persons with jobs:
- Counts: Employed persons, Jobs, Employees, Owner-Managers of Unincorporated Enterprises, Multiple job holders
- Status in employment: Employee, Owner-manager of Unincorporated Enterprise
- Income: Employment, Employee, Own Unincorporated Business, Duration adjusted income per job (annualised)
- detailed occupation and skill levels of persons
- detailed industry of job
- Sector (public/private)
- Number of jobs held (employee jobs and owner manager of unincorporated enterprise jobs)
- Duration of jobs
- Concurrent and non-concurrent jobs
Information relating to employers:
- employment size
- detailed industry of business activity
- type of legal organisation (TOLO)
- institutional sector (SISCA)
Statistical releases
LEED data is disseminated through the publications listed below. Additional data is available through Customised Data Requests.
Jobs in Australia
Frequency: Annual, from 2011-12
Jobs in Australia (JIA) provides aggregate statistics from the Linked Employer-Employee Dataset. It provides information about filled jobs in Australia, the people who hold them, and their employers. JIA provides data across 2,288 Statistical Areas as well as Local Government Areas.
Personal Income in Australia
Frequency: Annual, from 2011-12
Formerly Estimates of Personal Income for Small Areas, Personal Income in Australia (PIiA) provides a comprehensive range of income indicators across small geographic areas. PIiA is now based on the LEED, ensuring better consistency with Jobs in Australia.
Tablebuilder: Jobs in Australia
Frequency: Annual, from 2011-12
Release of Jobs in Australia data through TableBuilder. This enables users to build their own customised tables from the Linked Employer-Employee Dataset microdata, including for State and Commonwealth Electoral Divisions.