2021 Census overcount and undercount methodology

Latest release
Reference period
2021
Released
28/06/2022
Next release Unknown
First release

Purpose of the Census Post Enumeration Survey

The Census Post Enumeration Survey (PES) is run shortly after each Census to independently measure Census coverage. The PES results are used to determine how many people should have been counted in the Census, how many people were missed (undercount) and how many were counted more than once or in error (overcount).

The ABS use PES estimates of net undercount along with Census counts and administrative data to derive the estimated resident population (ERP) for 30 June of the Census year. PES results are also used to help identify improvements for future Censuses.

Overview of Census coverage

The Census is the most comprehensive snapshot of the country and tells the story of how our society changes over time. It includes around 10 million households and over 25 million people. The Census aims to accurately count the number of people in Australia on Census night.

The Census includes:

  • Visitors to Australia (regardless of how long they have been or plan to be in the country)
  • People in the six states, the Northern Territory, the Australian Capital Territory, Jervis Bay Territory, and the Territories of Christmas Island and Cocos (Keeling) Islands, Antarctica and Norfolk Island.

The Census excludes:

  • Foreign diplomats and their families
  • Australian residents out of the country on Census night
  • Australian external territories (minor islands such as Heard Island and McDonald Island).

 

Due to its size and complexity, it is inevitable that some people are missed or counted more than once. 

Some reasons why people may have been missed in the Census (i.e. undercounted) include:

  • they were travelling
  • they thought they were counted elsewhere
  • there wasn’t enough space on the Census form in the household where they were staying and they did not get additional forms
  • the person completing the form thought that certain people should not be included
  • they were reluctant to be included
  • their dwelling was missed.

Some reasons why people may have been counted more than once or in error (i.e. overcounted) include:

  • they were included on the Census form at the dwelling where they usually live, even though they stayed and were counted elsewhere on Census night
  • they have multiple usual residences
  • they moved during the Census period and completed forms at both their previous and new address
  • they were overseas on Census night and so should not have been counted at all but were included on a Census form.

Independence between Census and PES

The ABS designs the Post Enumeration Survey to be an independent measure of Census coverage. To achieve this, statistical independence between the PES and the Census must be effectively managed. There are two aspects to statistical independence: population independence and operational independence.

Population independence: There should be no subgroups of the population where being missed in the Census indicates that a person or dwelling is also more likely to be missed by the PES.

Operational independence: Census operations do not influence the PES, and vice versa.

Steps taken to manage independence in 2021 included:

  • Independently canvassing the sample frame to minimise dwellings missed in both collections
  • Using separate staff in PES and Census
  • Ensuring PES interviewers were not previously employed as Census Field Officers in the same area
  • Maintaining the confidentiality of the PES sample during Census collection
  • Using separate and secure IT infrastructure for processing PES data
  • Ensuring PES did not start until Census has finished
  • Excluding Census forms received after PES collection starts (i.e. late returns) from PES estimation.

Changes between 2016 and 2021

The 2021 Post Enumeration Survey used a new public facing name: the Post Census Review. The name was introduced to better reflect the survey and its purpose in plain language and was selected after qualitative testing via cognitive interviewing.

For the first time an Address Register based frame was used, consistent with the move of all ABS household surveys to this frame. As the Address Register was also used by the Census, independent quality assurance of the frame was undertaken through desktop address canvassing.

A telephone-first method of data collection was implemented to improve cost and field interviewer resource efficiency and optimise response across the sample.

How the data are collected

Scope and coverage of the Post Enumeration Survey

Sample design and selections

Desktop address canvassing

Sample size and response rates

Collection method

Impacts of COVID-19

Questionnaire

How the data are processed

A linking exercise was undertaken to determine whether each person in the Post Enumeration Survey was counted in the Census (and how many times), whether they were counted in error, or whether they were missed entirely. Linking PES persons to their Census form involved a range of automated and manual processes, focused on finding matches between approximately 117,000 PES person records and over 25 million Census person records.

Linking

In preparation for linking, PES and Census data were repaired and standardised to convert them into a format that enabled them to be directly compared.

Data was then linked over three distinct stages:

  • Address matching – matching dwellings in Census and PES through Address Register identifiers or address text strings.
  • Automated Data linking (ADL) – A probabilistic linking method that used personal and address information to evaluate the chance that a PES record and a Census record were for the same person. The method generated large numbers of candidate links, which were filtered down to likely genuine matches only. Each person and dwelling link pair were given a rating based on the quality of that link. All PES dwellings and persons with lower link ratings were clerically reviewed. A small percentage of the high-quality links were also clerically reviewed for quality assurance.
  • Clerical Linking – A team of coders manually confirmed or rejected candidate links provided by ADL using responses to name, sex, date of birth, age, marital status, Indigenous status and country of birth on both the PES and Census forms. In addition, they searched for people on Census forms at alternative addresses provided by the PES respondent, or in surrounding areas.
Dwelling Match Rates, by State/Territory of enumeration
NSWVicQldSAWATasNTACTAus
Matched11,8539,0588,5204,6845,8795,1024,9703,73953,805
Not matched7124436273923284264651103,503
Dwelling match rate (%)94.395.393.292.394.792.391.497.193.9
Total dwellings12,5659,5019,1475,0766,2075,5285,4353,84957,308
Person match rates, by State/Territory of enumeration
NSWVicQldSAWATasNTACTAus
Linked (one or more times)25,33719,61017,4429,34211,9179,7479,0858,013110,493
Not linked1,2449211,2453961,0105681,2572486,889
Person link rate (%)95.395.593.395.992.294.587.897.094.1
Total persons26,58120,53118,6879,73812,92710,31510,3428,261117,382

Weighting and estimation

The PES weighting and estimation process involved assigning a weight to each selected PES dwelling and then to each person in the PES. The weights attached to PES persons allowed the PES estimates to represent the whole population of interest; all usual residents in Australia on Census night, including people in non-private dwellings (e.g. hotels, hospitals and jails) which were not covered by the PES dwelling sample.

PES weighting was done in two stages:

  • Dwelling weighting – using a dual system estimation technique to adjust the PES selection weight to add up to the Census private dwelling count within categories based on geography and dwelling characteristics. A weight adjustment was also applied for PES dwellings that were missed in the Census, and a non-response adjustment was done so that the responding PES dwellings represented other dwellings from which no response was obtained by the Census.
  • Person weighting – using a Prediction Regression (PREG) estimator [1] to adjust the dwelling weights to ensure PES estimates of people counted in private dwellings in a set of benchmark categories matched the actual Census counts for these categories. PREG ensured the weight adjustment applied to a person did not depend on whether they responded in the Census, but only on characteristics of the person as reported in the PES. Person weights were then adjusted so that the PES estimates represent people in non-private dwellings.

To calculate the PES population estimate, PREG took into account the number of people who should have been counted in a given category, as well as the number of links to people in that category (regardless of how they reported in PES) and the number of people actually counted in that category in the Census. The latter two parts were in place to ensure the 'should' estimate was adjusted appropriately to represent the full population across all characteristics of interest. This was required because not all benchmarks were used across both steps in person weighting.

For example, to calculate the estimate of Aboriginal and Torres Strait Islander people, PREG took the weighted sum of all persons who should have been counted as Aboriginal and Torres Strait Islander (968,308 persons), then subtracted the weighted sum of all PES persons linked to an Aboriginal and Torres Strait Islander person in the Contact sector (790,969 persons) and then added in the total number of Aboriginal and Torres Strait Islander people counted in the Contact sector in the Census (805,918 persons). This equalled a population estimate for Aboriginal and Torres Strait Islander people of 983,257 persons.

Net undercount for any category of person was then calculated as the difference between the final PES population estimate for that category and the actual Census count (including imputed persons in non-responding dwellings).

Footnote

1 Chipperfield J, Brown J and Bell P 2016. ‘Estimating the Count Error in the Australian Census’, Journal of Official Statistics, vol. 33, pp. 1–17.

Components of net undercount

Net undercount is the difference between the estimate of how many people should have been counted in the Census as determined through PES and the actual Census count (including imputed persons).

The net undercount comprises both overcount and undercount in different sectors of the Census. For the purposes of PES estimation, persons were categorised into either the contact sector or the non-contact sector.

Contact sector

The contact sector includes:

  • persons in dwellings for which a Census form was received before the start of PES enumeration (includes persons overcounted and persons missed from these forms)
  • persons from occupied dwellings that were missed by the Census
  • persons missed by Census because their dwellings were mistakenly deemed unoccupied on Census night.

Net undercount in the contact sector for a given category of person can be disaggregated into four sub-components: gross undercount, gross overcount, net difference in classification and Census category not stated.

Gross undercount

Gross overcount

Net difference in classification

Census category not stated

Calculating the net undercount for the contact sector

Non-contact sector

The non-contact sector includes:

  • Persons imputed into occupied non-responding Census dwellings
  • Late returns
  • Persons with insufficient information on their Census form.

While PES traditionally measures a net undercount of persons in the contact sector, the non-contact sector is typically characterised by a net overcount of persons. This is essentially a measure of over-imputation for non-responding dwellings deemed occupied in the Census.

Persons imputed into occupied non-responding Census dwellings

Late returns

Persons with insufficient information on their Census form

Calculating net undercount for the non-contact sector

Calculating total net undercount from the components

While total net undercount is simply the difference between the PES population estimate and the Census count for a given category, it can also be calculated from the components. Using the component method, net undercount equals the sum of gross undercount, net difference in classification, Census category not stated, and the net undercount for the non-contact sector, then subtracting the gross overcount.

Gross coverage error

Net undercount provides a measure of the net coverage error of the Census (the net result after combining the various types of undercount and overcount). However, it can mask the elements that help us understand the effectiveness of the Census in getting a response from all people.

Gross coverage error provides a view of how well the population was captured without the added layers of overcount and the adjustment from imputing people into occupied non-responding dwellings that otherwise offset the level of undercount. It is an additional measure describing the quality of the Census.

To calculate gross coverage error we first need to calculate gross coverage.  Gross coverage is an estimate of the number of unique real people that were included on Census forms. It is the total Census count minus the number of imputed persons and minus the estimated contact sector gross overcount. 

Gross coverage error is the difference between the PES population estimate and the gross coverage estimate. It is divided by the PES population estimate to get the gross coverage error rate.

Estimates for the Northern Territory (NT) provide a good example of how the two measures differ. The net undercount rate for the NT increased from 5.0% to 6.0% between 2016 and 2021, whereas the gross coverage error decreased from 15.3% to 14.3%. Changes to both measures were due to decreases in the amount of gross overcount and in the overcount associated with imputed persons. Specifically, gross coverage error decreased because the 2021 Census received unique responses from a higher proportion of the NT population, while net undercount increased because there was less overcount to offset the amount of undercount.

Undercount adjustment factor

While estimates of net undercount are important for an effective understanding of the completeness of Census counts, undercount adjustment factors provide an indication of how much the Census count for a given category would need to be adjusted to reflect the PES population estimate for that category.

The undercount adjustment factor is the ratio of the PES population estimate to the actual Census count. This factor can be applied to the Census count for any category to indicate how many people should have been counted in that category. For example, the Census count of 25,417,978 persons in Australia multiplied by the adjustment factor of 1.007 (or unrounded: 1.00747676) indicates that 25,608,022 persons should have been counted in the Census.

The undercount adjustment factor is not, and therefore should not be, used alone to derive an alternative measure of the Estimated Resident Population (ERP). Official population estimates include additional data and adjustments for usual residents of Australia, such as for those who were temporarily overseas on Census night. For more information see "Methodology used in rebased population estimates".

Understanding net undercount for Indigenous status and Country of birth

Asking a person’s Indigenous status or Country of birth may be considered personal and sensitive. As a result, some people choose not to answer these questions in the Census. If no answer is provided to these questions, the Census does not impute a value for the missing response. This is also true for people imputed into non-responding dwellings deemed occupied on Census night.

The not stated responses for Indigenous status and Country of birth in the Census contributed to the higher net undercount we see in these categories, compared with the National total. This is because they were not counted in the Census for that category but were still counted in the Australia total.

For example, there were 6,525 people who identified as Aboriginal or Torres Strait Islander in the contact sector in the 2021 PES but for which Indigenous Status was ‘not stated’ in the Census. These people all contributed to the Aboriginal and Torres Strait Islander undercount because their Indigenous status in the Census was not known.

The contribution of the not stated responses for Indigenous status and Country of birth also explains why net undercount for the individual categories within these two characteristics do not add up to the Australia total net undercount.

For Indigenous status, the total net undercount for Australia (190,044) can only be matched by adding the net undercount for Aboriginal or Torres Strait Islander people (170,752) plus net undercount for non-Indigenous people (1,252,787) and subtracting the total Census not stated count (1,233,495).

Accuracy

Sampling error

Non-sampling error – Link error

Non-sampling error – Correlation bias

Non-sampling error – Non-contact sector

Glossary

Show All

Abbreviations

Show All

Back to top of the page