The ABS is researching how administrative data can be used to improve the Australian Census. One area of research has been whether it can help to improve the Census count of people in Australia on Census night.
What is administrative data?
Administrative data refers to information maintained by governments and other entities that is made available to the ABS for statistical purposes. It includes data used for registrations, transactions and record keeping, usually during the delivery of a service.
Why are we doing this?
The Census is the authoritative source on counts of people living in Australia. These are critical to informing the planning and delivery of Government and community services, business decisions and academic research. Often the Census is the only source of counts for smaller regions and particular groups within the population.
The 2016 Census Independent Assurance Panel (CIAP) recommended a couple of areas where Census counts might be improved through the assistance of administrative data. This research responds to these recommendations.
For more information on why and how we are conducting this research, please refer to our page on Administrative data research for the 2021 Census.
Your privacy is important to us. Any plans to use administrative data such as those researched in this article would be addressed as part of the 2021 Census Privacy Impact Assessment.
Improving the Census count with administrative data
Our research has shown two ways that administrative data could be used to improve our Census count of Australians on Census night.
The first is by improving the way we provide counts of people for houses that we thought were occupied on Census night but where no Census forms were received. The CIAP noted that the counts provided for these houses tended to over-represent older Australians.
The second is by improving our decision on whether a house was actually occupied or not, so that we avoid providing counts for houses that weren’t occupied. The CIAP noted that we over-estimated the number of occupied houses, and therefore counted extra people where we shouldn’t have.
Improving counts for houses where no form is received
The Census has always provided a count of persons with their age, sex and marital status for occupied houses where no Census form was received. This is done by ‘borrowing’ Census counts from a similar, nearby house where forms were received. We call these ‘donor’ houses. Doing this ensures the Census dataset is more representative of the true number of people in the country on Census night, both nationally and for local regions.
A problem with this approach is that the donor houses tend to over-represent older Australians who are generally more likely to have responded to the Census. Administrative data, such as counts of people from de-identified Medicare and Centrelink data, is showing us that people in the houses where forms aren’t provided tend to have a younger age profile on average.
Our research is showing that this administrative data can help us choose donor houses from the Census that more truly reflect the ages of people in houses that don’t provide forms.
The graph below shows the improvement from using this de-identified administrative data to choose better donor houses. Firstly, the blue line shows the true age distribution of people in houses where no forms were provided. This ‘truth’ is measured by a survey we run after the Census called the Post Enumeration Survey (PES). You can see it has a shaded margin of error on either side of it since it is only based on a sample of houses.
The green line shows the age distribution of person counts generated from the usual approach to choosing donor houses. You can see that older people are over-represented. The red line shows the new age distribution if we had used administrative data in our models to guide the choice of donor houses. It is much closer to the true age distribution we measured from the PES.
Improving our decision on whether a house is unoccupied
To provide counts for occupied houses where forms weren’t provided we first need to decide whether the house was actually occupied. This is becoming more difficult because our population is now more mobile, and more people are in living in places which are harder to access such as secure apartment blocks. When we continue to be unsure whether a house is occupied after repeated visits to collect forms, we err on the side of assuming that it is occupied.
The 2016 PES showed that, compared to the 2011 Census, there was a large increase in the number of empty houses that were incorrectly determined as occupied. In fact, of all houses where no Census forms were provided, almost every second house we judged to be occupied was actually empty. This meant we generated higher counts of people for the Census than there should have been.
This hasn’t affected our national population estimates because we’ve adjusted for this over-counting using information from the PES. And for most local areas, it makes very little difference to Census counts. But for a small number of areas, particularly those with more secure apartment blocks, the difference starts to become noticeable (the graph below provides more information).
Our research shows that administrative data can be used along with the observations we make when we visit houses in person to improve our final decision on whether houses were occupied.
We can model a ‘signs of life’ indicator for houses using administrative sources such as electricity connections, rentals data and updates to address information for tax payments. To do this we don’t have access to data beyond address and de-identified activity data around Census time. The arrangements we have in place are in line with the Privacy Act 1988 and include very high levels of data security.
Our analysis shows that by using this indicator we can reliably reduce the number of houses we incorrectly judge to be occupied. It also shows a larger reductions in the number of secure apartments we determined as occupied where this issue was more pronounced.
The graph below shows how the counts for small regions (areas with about 1,000 people on average) would have been reduced if we’d used this new approach. Most areas have very small reductions. Some areas, however, have larger reductions where this use of administrative data would have had a particularly noticeable benefit. Often these are areas with a large number of secure apartments.