Realising the potential of data in government

Dr David Gruen AO
Australian Statistician
Wednesday 15 June 2022
Institute of Public Administration Australia (IPAA) ACT*

Introduction

I begin by acknowledging the Ngunnawal people, the Traditional Custodians of this land. I pay my respects to their Elders past, present and emerging, and extend that respect to Aboriginal and Torres Strait Islanders present here today.

Thank you IPAA for the opportunity to speak about realising the potential of data in Government. I last spoke about data at an IPAA event in March 2020 just before social distancing became a thing. It feels like a long time ago!

In that speech, I talked about the promise of data in Government, and I focused on the enormous potential of data to support Government and the Australian Public Service meet big public-policy challenges. I suggested that data – especially integrated data – would have an increasing role underpinning evidence-based public-policy responses. I noted the importance of trust in the public sector’s use of data, and I expressed support for the Data Availability and Transparency Bill that was soon to be brought before the Parliament.

Well, a lot has happened since March 2020. And my proposition that data would play an increasing role in policy development has turned out to be a better forecast than many of the ones I made as a macroeconomic forecaster in years gone by!

Today I will describe the transition from the potential of data in government to realising that potential.

In the time available, I will limit myself to three broad topics. Firstly, I will talk about some of the new data sources that have become available and how they are being used to generate new statistics. Second, I will talk about the growth in integrated data assets across many dimensions. These dimensions include improved timeliness, number of integrated data assets, the increasing variety of the subject matter they cover, and the strong growth in usage, both by government agencies and researchers. Third, I will reflect on the opportunities presented by the passage of the Data Availability and Transparency Act.
When I addressed IPAA in March 2020, the COVID-19 pandemic was in its infancy, but it was already looking like data, both health data and data about the economy, would play a pivotal role in helping guide the Government, policymakers and the community through the coming challenges.

Elsewhere, I have spoken about the early ABS responses to the pandemic, introducing new small rapid business and household surveys and publishing a range of preliminary and provisional data.[1] I will not go over this ground again here.

New Data Sources

I do want to talk about the new data sources we are accessing to provide statistical information that was previously unavailable. Data from these new sources are all by-products of the digital revolution – a consequence of so many aspects of our lives now being intermediated through digital platforms. And another general point: all these new datasets are examples of ‘big data’ – large, and usually more complex, datasets from new sources.

Let me begin with Single Touch Payroll (STP). The Australian Taxation Office (ATO) receives payroll information from employers with STP enabled payroll software each time the employer runs their payroll. Given the extensive coverage of the STP system, these data cover more than 10 million employees. That is not quite every employee in the country, but it is not far from it. The pandemic made access to this rich vein of near real-time information an urgent priority. The ATO expedited access, and the ABS began receiving these data in early April 2020.[2]

From then on, each week, the ATO provides job and wage data from the STP system to the ABS with which we produce a new publication: Weekly Payroll Jobs and Wages.

In many ways, access to Single Touch Payroll data taught us new ways of doing things. Given the scale and complexity of these data, it made sense to ingest and analyse them using cloud computing services rather than using our existing computer systems. And that is the new model for accessing public and private sector big data assets to generate new statistical insights. Let me describe a few of them.

In October 2021, we began releasing a new monthly indicator of business turnover, based on Business Activity Statements (BASs) submitted to the ATO. Again, to give you a sense of scale, there are about 130,000 BAS remitters from whom we gather information for this new monthly indicator. This should be compared to our comparable survey, Quarterly Business Indicators, which is based on a sample of 16,000 businesses.

In February this year, we released a second monthly indicator which provides a measure of household consumption. This indicator is based on about 800 million bank transactions by households each month (with these data provided by Australia’s major banks in aggregated, de-identified form). Household consumption accounts for about half of GDP, so there is considerable value in having an accurate measure of it. The existing monthly measure of household consumption comes from the Retail Trade Survey, based on a sample of around 3,400 businesses.

The Retail Trade Survey covers about 30% of household consumption, whereas the new measure, based on banks’ transactions data, covers 68% of household consumption, so that is a substantial step up. Many items of household consumption are captured in the new transactions-based measure but are missing from the retail trade survey. To give a few examples: purchase of petrol, car servicing and maintenance, train and bus tickets, Uber rides, airline tickets, hotels, theme parks, haircuts, dentists and allied health costs. None of these are in the retail trade survey and all are captured in our new indicator.

We are also developing a partial monthly indicator of CPI inflation, which is particularly relevant given the current inflationary environment. This has been aided by our access to digital data sources, including scanner data from supermarkets and web-scraped prices data. We plan to release an information paper within the next couple of months and begin publication of the partial monthly indicator later in the year.

We will also be releasing a new monthly indicator of individual earnings in 2023, using STP data.

A significant benefit of using existing data, collected for other purposes, to generate these new indicators, is that there is no need to put a new survey in the field, which would place an unavoidable burden on respondents to the survey.[3]

The digital revolution also offers new ways to reduce the burden on our existing survey respondents.

We are working with businesses, accountants, bookkeepers, and accounting software companies to co-design a new reporting application that links with the accounting software that businesses currently use. In the future, a business will have the option to extract and pre-fill their financial data directly into an ABS web application from their accounting software package. This removes the need for businesses to manually collate the information and key it into our surveys. Once this is up and running, we estimate there will be a sixteen thousand hours per annum reduction or 70% less time that small and medium businesses spend completing ABS surveys. As part of the new initiative, we will also provide tailored reports back to business to help them understand their performance relative to similar businesses.

Integrated data assets

When I spoke at the IPAA event in March 2020, I described the growing number of integrated data assets being used across the public sector to enable research, policy development and analysis. I focused on BLADE (Business Longitudinal Analysis Data Environment) and MADIP (Multi-Agency Data Integration Project), the business and person integrated data assets, which have been developed and enhanced over many years by the collaborative efforts of many people across many Commonwealth agencies and departments.

In the little over two years since that talk, there has been much progress moving from the promise of integrated data assets to realising the benefits they can provide in the service of better public policy. Let me describe some of this progress.

Our earlier standard practice was to update the underlying data in both BLADE and MADIP once a year. But as these data assets have matured, processes have been streamlined and key enabling infrastructure (the ABS DataLab) has been moved to the cloud. This enhances security and makes possible more sophisticated data analysis. It also means both BLADE and MADIP can now be updated much more frequently.

There have been many additions to these integrated data assets. Let me describe a few of them.

We have introduced a new quarterly updated Business Locations dataset to BLADE enabling detailed geospatial economic analysis. This addition to BLADE, with quarterly updates to BAS data, recently allowed the ABS to provide the National Recovery and Resilience Agency (NRRA) with detailed geographic business counts and economic information for the flood devastated areas of New South Wales and Queensland.

Single Touch Payroll data are being integrated into the core of the BLADE data asset and will be available to approved researchers within weeks.

The core data in MADIP and BLADE have now been linked together and, as a result, researchers have been able to use these two assets to undertake research and analysis of linked employer and employee longitudinal information to determine the impact of COVID-19 on businesses and people, and to examine economic recovery and employment and unemployment patterns.

Data from the Australian Immunisation Register are being linked to MADIP each week. Provisional Death Registrations data are being linked and updated monthly. These data are being used by the Department of Health to generate insights for the Australian COVID-19 Vaccine and Treatment Strategy, and by state health departments and primary health networks. Here are a few specific examples:

The National COVID-19 Vaccine Taskforce identified groups across Australia with low-vaccine uptake who spoke languages other than English. In response to this information, culturally appropriate communication campaigns, digital translations, and community outreach activities were implemented to lift vaccine rates for these groups.
Jurisdictions have deployed multilingual GPs and healthcare workers to better support multicultural communities.
Provisional Deaths Registrations data are being used alongside a range of other socio-demographic information to understand risks to vulnerable groups within the community, as well as to support winter-preparedness strategies.
These data have also been used by peak technical advisory groups like the Australian Technical Advisory Group on Immunisation and the Australian Health Protection Principal Committee to inform their decision-making.

To support Treasury analysis, the Labour Market Tracker Project integrated job-related data, including STP, JobKeeper and JobSeeker data to both BLADE and MADIP. Datasets are updated fortnightly, monthly and quarterly as they became available, to enable up-to-date monitoring of the labour market and the economy.

A further key data integration project is the National Disability Data Asset (NDDA). The NDDA is under development and will include a collection of linked, de-identified datasets from across multiple Commonwealth, State and Territory agencies to better understand the lives of people with disability and their pathways through services. The NDDA will be underpinned by a new national data integration infrastructure, known as the Australian National Data Integration Infrastructure (ANDII). The ABS, Australian Institute of Health and Welfare, and the Department of Social Services are partnering to deliver this initiative. The NDDA and ANDII will be co-designed and co‑governed with state and territory partners, as well as the disability sector. ANDII is being built in such a way that it can be re-used in public policy domains beyond disability.

And my final example gives a sense of the breadth of subject-matter areas in which data integration projects are being developed. It goes by the name the ‘Justice Spine’. It is a longitudinal national data asset linking police recorded criminal offenders in Australia’s criminal courts with adult prisoners in the corrective services systems. The dataset will show how people move and interact within and across the justice system nationally, something that is currently not possible. The dataset will have the potential to be linked to other Commonwealth, state and territory held datasets for deeper analysis of the characteristics of criminal offenders. It will be available to approved policymakers and researchers in late 2023 and will enable analysis of patterns of offending and policies to reduce recidivism.

The ABS DataLab

There is little point sharing and integrating data if it cannot be accessed. Increased data integration capability is complemented by a data access service: the ABS DataLab. The ABS DataLab enables sophisticated analysis of detailed micro data in a secure controlled environment. Use of the ABS DataLab is currently growing at about 30 per cent every year. To give a sense of this growth, there were about 50 DataLab users in 2016, almost 900 by 2019, and around 4,000 now. There are about 400 active projects across governments, both Commonwealth and State, and Australia’s research sector.[4]

The DataLab is also being made available as a platform for data sharing. The Department of Finance and the ATO will use the ABS DataLab to enable secure sharing and sophisticated analysis of their data.

The richness of datasets now available in the ABS DataLab has significant value for academic research. With appropriate safeguards in place, we are now piloting access for international academic researchers by partnering, in the first instance, with the OECD and with Professor Greg Kaplan at the University of Chicago.

By providing international access to what are now high-quality data assets (with appropriate safeguards), there is the prospect that more international researchers will be attracted to working on Australian policy issues using Australian data. This can only help in generating new insights on Australia’s policy challenges.[5]

Data Availability and Transparency Act (DATA) 2022

On the 31 March 2017, the Productivity Commission sent its Data Availability and Use Inquiry report to the Treasurer, the Hon Scott Morrison MP. The Commissioners on the Inquiry were the Chair of the PC, Peter Harris, and Melinda Cilento, now the Chief Executive of the Committee for Economic Development of Australia (CEDA).

Five years later, to the day, on 31 March 2022, the Data Availability and Transparency Act (DATA) received Royal Assent.

The Act establishes a new, best practice scheme for sharing Australian Government data, underpinned by strong safeguards.

The DATA Scheme is focused on increasing the use of Australian Government data to help deliver government services, inform government policies and programs and support research. It will provide strong support for better government data use and collaboration. Importantly, the DAT Act enables the sharing of Commonwealth data with the states and territories, as well as academics.

It is worth explaining the relevance of the DAT Act for the ABS. The ABS’s legislation supports making aggregate data publicly available and the ABS DataLab enables authorised access to detailed micro data while protecting the privacy of individuals. Increased data sharing under the DAT Act will increase the need for secure sharing and access infrastructure and there are opportunities to expand the use of the ABS DataLab as a service so agencies, like the ATO, can use the DAT Act to share data safely to a range of users. There are also opportunities for the ABS to use the DAT Act to streamline our data sharing with other agencies, including by making it easier to share de-identified data with trusted partners such as the AIHW. And agencies across the Commonwealth will now have a streamlined legal pathway to share the data they hold with the ABS – like the Department of Education, Skills and Employment sharing apprentices and traineeships data for data integration projects.

I would like to take this opportunity to thank some of the key people who turned the promise of the 2017 Data Availability and Use Inquiry into the reality of the 2022 Data Availability and Transparency Act. Gemma Van Halderen guided the development of the Government’s response to the PC Inquiry, working from the Department of Prime Minister and Cabinet. Deb Anton, the Interim National Data Commissioner, set up the Office of the National Data Commissioner, which worked up the details of the scheme and drafted the Bill which became the DAT Act. Gayle Milnes, the inaugural National Data Commissioner, will administer the scheme. And finally, thanks to the members of the 46th Parliament of Australia who passed the Bill into Law less than a fortnight before the parliament was prorogued.

I look forward to working closely with Gayle and her office in coming years.

Conclusion

Let me wrap up.

It is instructive to look back at the recommendations on the use of data from the 2019 Independent Review of the Australian Public Service, the Thodey Review. The Review made a strong case for enhancing the use of data to support public policy formulation and better service delivery. Many of the things I have talked about today were foreshadowed in the Review as important directions for the future of the APS. Among these are accessing new data sources for public-policy purposes, wider use of integrated data assets to rigorously develop and improve policies, and legislation and infrastructure to enable data to flow securely between agencies. The Thodey Review also recommended the APS launch linked Data and Digital Professions to build data and digital expertise – which of course has also come to pass.

There is always more to do and we can’t rest on our laurels. But equally there has been impressive progress over the past few years in helping to realise the potential of data in government.

Thank you.

Footnotes

* I am grateful to Lucy Jones, Celia Moss and Marcel van Kints for their help preparing these remarks.

[1] See Measuring the impacts of COVID-19: Briefing to the Australian Business Economists; and Innovation and data: Address to the Australian Public Sector Innovation Show.

[2] We are extremely grateful to the ATO for this access, particularly given how busy they were at the time delivering the JobKeeper package amongst other activities. Data on 10 million employees from STP allows us to produce detailed geospatial analysis (or to disaggregate across other dimensions) which is not possible using the 50,000 or so individuals from whom we collect data in the monthly Labour Force Survey. This coverage and detail are benefits of administrative ‘big’ data sources.

[3] On the other hand, a drawback of big data is that it may not be representative of the whole population, in contrast to a well-designed survey. For example, STP-enabled businesses are unlikely to be representative of all businesses.

[4] About 60 per cent of current users are from government departments and agencies at both the Commonwealth and State and Territory levels, with the remaining 40 per cent from universities and research agencies. On active projects, the Australian Institute of Health and Welfare and the Australian Treasury have the most projects within the Commonwealth, while the Australian National University has the most within the academic sector.
NSW State Government agencies are the most active users among the states and while, historically, state and territory governments have had fewer projects, we expect strong growth in the future.

[5] In the academic discipline I know best – economics – it is hard for academics to get research on Australian economic issues published in the top international journals. Making Australian data available to international researchers should generate more interest in Australian policy issues that can be tackled using these data. In turn, this should make a modest contribution to improving international recognition of academic work conducted using Australian data.