Methodological News, March Quarter 2024

Features important work and developments in ABS methodologies

Released
20/03/2024

This issue contains four articles:

  • Adopting the Composite Regression Estimator for the Quarterly Business Indicator Survey
  • Improving Estimates of Aboriginal and Torres Strait Islander Life Expectancies
  • Shaping the Future of ABS Household Surveys - Building Better Resilience and Reducing Costs
  • A Systematic Evaluation of Large Language Models for Enhanced Enterprise Search

Adopting the Composite Regression Estimator for the Quarterly Business Indicator Survey

The latest release of the Quarterly Business Indicator Survey (QBIS) results has been compiled using an improved estimation method, known as the composite regression estimator (CRE). This method, which was previously described in Methodological News issue aligns  with the ABS’ broader priorities of reducing provider burden and making greater use of administrative data sources. 

The composite regression estimation technique serves to reduce sampling error via two primary objectives, namely (1) use administrative tax data to reduce sampling error of level estimates; and (2) exploit the overlap in consecutive samples to reduce sampling error of movement estimates.

  • Objective 1

The Quarterly Business Indicator Survey (QBIS) aims to estimate private sector sales, wages, profits, and inventories. To achieve this, the ABS surveys a sample of businesses and assigns weights to each reported business’ response to reach estimates of the total population values. The key data items that QBIS publishes exhibit strong correlations with Business Activity Statement (BAS) Turnover and Wages data from the ATO, which we access for all businesses in scope of QBIS. By benchmarking the sampled data against the administrative records, we gain insights into the sample’s characteristics. An optimisation algorithm is used to derive weights such that the weighted sample estimates of the administrative data are consistent with the known population totals of the administrative data. This adjusts for any lack of representativeness in the sample and enables us to produce more precise level estimates.

  • Objective 2 

The ABS manages the trade-off between provider burden and statistical precision through a rotating selection process. Each iteration the ABS includes new businesses in the sample while retaining some existing ones. The composite regression estimator leverages the connection between the previous and current sample to enhance stability and reduce variability in movement estimates. 

By applying this estimation technique to the QBIS we can achieve level and movement estimates for the key data items that have lower sampling error.

Reducing provider burden:

When designing a survey there is a balance between data quality and sample size. Larger samples can yield higher quality results but come with higher costs and increase burden. Currently, the Business Indicator publication involves surveying around 16,250 businesses per quarter. By introducing this new estimator, we could have achieved estimates with lower sampling error using the same sample size. Alternatively, we chose to trade improvements to sample error with sample size, in line with the ABS’ strategic priorities to reduce provider burden. This is achieved via a sample redesign, which will be implemented next quarter. 

As a result, in the March 2024 Business Indicator release, comparable quality estimates will be produced with a sample of 12,750 businesses. This is a reduction of around 3,500 businesses, which represents around 20% of the QBIS sample size.

Implementation of the composite regression estimator:

Additionally, we have taken the opportunity to make a related change to the imputation methodology for large businesses that do not respond to the survey. In QBIS we account for non-response by applying explicit imputes for each business that does not respond. In the past, for businesses with no historical reporting information available we have used mean imputation. In line with the implementation of the CRE, we have moved to using business’ BAS wages and turnover as auxiliary data to inform the imputes. This results in more accurate imputes for the large businesses, and hence contributes to improving the quality of the resulting estimates. The ABS will investigate the effectiveness of auxiliary imputation for smaller businesses and look to introduce this for future publications.

The calibration process produces a different set of survey weights to the previous estimation technique. Consequently, these techniques yield slightly different estimates based on the same reported data. Investigations have shown that there are no systematic differences in the estimates using the two methods, so revisions to historical publications are not necessary. 

In summary, the composite regression estimator has allowed the ABS to use administrative tax data to continue to produce high quality quarterly Business Indicator estimates while alleviating the burden on businesses.

For further information, please contact Eleanor Day and Jack Steel.

Improving Estimates of Aboriginal and Torres Strait Islander Life Expectancies

The ABS continues to focus on improving the complex methodology used to estimate Aboriginal and Torres Strait Islander life expectancies (see Aboriginal and Torres Strait Islander life expectancy methodology, 2020–2022) in response to an independent review in 2020 (see Independent review of the ABS' Aboriginal and Torres Strait Islander life expectancy estimates).

Key inputs to calculating life expectancy are accurate data on the number of deaths in a specified period and the estimated size of the population. Accuracy is challenging for Aboriginal and Torres Strait Islander deaths which are under-represented in data provided to the ABS by each state and territory Registry of Births, Deaths, and Marriages. To improve representation, two adjustments are applied to deaths based on data linkages between the Census and death registrations, and between the Census and Post Enumeration Survey (PES). For the calculation of 2020-2022 Aboriginal and Torres Strait Islander life expectancies, the ABS enhanced these two adjustments to reduce bias related to:

  • death records that were unable to be linked to Census data
  • age groups with small sample from the post enumeration survey.

The first adjustment is based on person record links between Census data and death registrations in the year following Census night. These links provide information on the propensity that a death reported with one of the three categories of Indigenous status (Aboriginal and/or Torres Strait Islander, non-Indigenous, not stated) is recorded with one of these status categories in the Census. These propensities are used to adjust for under-representation of Aboriginal and/or Torres Strait Islander status in death records.

Investigation of the linkage of 2022 registered deaths to 2021 Census data showed that while overall linkage rates had declined from the previous cycle (2015-2017), the relative difference in linkage rates between Aboriginal and/or Torres Strait Islander status records and non-Indigenous status records, had increased. Investigations also showed that adjustment factors are sensitive to changes in the relative size of these linkage rates, and so the methodology was enhanced to use both linked and unlinked death records in calculating the revised adjustment factors. This reduced bias associated with the gap in linkage rates.

The second adjustment uses propensities of identification based on linked person records between Census and PES. We know that propensity to identify differs with age and so in this adjustment propensities were calculated for three age groups: 0–14 years, 15–59 years and 60 years and older. However, where propensities and adjustment factors are calculated using small PES sample, they have a higher degree of uncertainty. For the 2020–2022 life expectancy estimates, this uncertainty was reduced by combining age groups where the PES sample was too small. In effect, this meant all age groups in Census statuses of non-Indigenous and not stated were combined, while all three age groups remained distinct in the Census status Aboriginal and/or Torres Strait Islander category. This is referred to as partial-age adjustment.

The revised method of partial age-adjustment was consistently used to derive adjustment factors for Aboriginal and Torres Strait Islander deaths at the national level, for States and Territories, for Remoteness Areas and SEIFA Index of Relative Socio-Economic Disadvantage. See Updated method for 2020–2022 Aboriginal and Torres Strait Islander life expectancy estimates | Australian Bureau of Statistics (abs.gov.au) for further details.

For more information, please contact Michele Haynes or Frances Algert.

Shaping the Future of ABS Household Surveys - Building Better Resilience and Reducing Costs

National statistical institutes around the world face declining response rates and increasing data collection costs when conducting household surveys. These challenges are compounded by disruptions to survey operations resulting from the aftermath of the pandemic, natural disasters, and workforce issues. During both times of crisis and normal times, governments, businesses, and communities need reliable and timely statistics to measure Australia’s economy and the wellbeing of its people. Prudent use of auxiliary data can help reduce production costs, improve the resilience of production processes, and expedite the delivery of statistical outputs for household surveys. 

Auxiliary data may provide benefits at almost every step of the household statistical production cycle, such as:

  • more efficient survey sample designs
  • reducing field enumeration costs
  • better tailoring of response modes (web form, telephone, or personal interview) to respondents
  • ensuring sample representativeness and coverage
  • improved targeting of non-response follow-up 
  • improved editing and imputation for missing data 
  • improved estimation accuracy to account for non-response.

Importantly, better information will help ABS to design more efficient household surveys thereby reducing provider load and survey costs with no reduction in statistical accuracy.

ABS has recently established the SHAPE (Statistics for Households using Auxiliary data for Production Efficiency) project to develop an internal roadmap for utilising auxiliary data more effectively. The roadmap will identify which aspects of the household survey production cycle and what data sources are most likely to deliver the biggest cost reductions and efficiencies for lowest initial development cost. This will form the basis for prioritising and coordinating various initiatives over the next several years. In addition, milestones for delivering these initiatives will be set out in the roadmap.

Any use of auxiliary or administrative data about people and households will require prior approval and must comply with relevant privacy policies, namely the ABS Privacy Policy for Statistical Information and ABS Privacy Policy for Managing and Operating our Business.

For more information, please contact Daniel Elazar.

A Systematic Evaluation of Large Language Models for Enhanced Enterprise Search

Large language models (LLMs) use artificial intelligence algorithms to complete natural language processing tasks. Recently, popular LLMs (such as ChatGPT) have shown impressive performance in responding to a range of tasks, such as text summarisation and question-answering. Yet, LLMs suffer from hallucination and producing irrelevant text.

Recently, a Retrieval Augmented Generation (RAG) architecture which combines a traditional search tool (retriever) with an LLM (generator), has become a popular approach to address LLM shortfalls. Following a query in a typical RAG question-answering pipeline, relevant documents are first retrieved, which then provide context to another LLM to synthesize an answer to that query. 

The ABS has recently been exploring the suitability of RAG architectures to facilitate user querying of the ABS website. This also includes investigating existing evaluation frameworks and metrics capable of measuring aspects of both retrieval and generation performance, as well as the emerging discipline of prompt engineering, an important concern in fine tuning these models.  

More specifically, this evaluation approach is examined using the StatsChat LLM application. StatsChat is an experimental search pipeline and front-end web application developed by the UK Office for National Statistics (ONS). The application, written in Python, is built entirely from open-source components and is available on GitHub. Users submit natural language queries via the graphical user interface of the application, and it interprets the semantic meaning of queries and documents and returns answers from shortlisted documents. The use of specified website content for information retrieval and synthesis minimises the risk of the LLM providing incorrect and biased answers. 

The ABS’s systematic evaluation methodology include the following steps:

  • developing a web scraping process to extract the context of ABS webpages
  • developing a module to generate an evaluation dataset in the form of a set of query-answer pairs from the scraped information
  • generating predicted answers for queries by running a RAG pipeline over ABS webpages
  • automating the evaluation of the predicted answers using various metrics via the LLM-as-judge approach
  • testing the LLM-as-judge evaluator through involving human assessment to ensure that the metrics align with human judgement.

This evaluation approach is still underway.  

For further information, please contact Hamid Khataee or Ilana Lichtenstein.

Contact us

Please email methodology@abs.gov.au to:

  • contact authors for further information
  • provide comments or feedback
  • be added to or removed from our electronic mailing list.

Alternatively, you can post to:

Methodological News Editor
Methodology Division
Australian Bureau of Statistics
Locked Bag No. 10
Belconnen ACT 2617

The ABS Privacy Policy outlines how the ABS will handle any personal information

Previous releases

Releases from June 2021 onwards can be accessed under research.

Releases up to March 2021 can be accessed under past releases.

Back to top of the page