Melbourne Business Analytics Conference 2022: Opening Address
Learning and Unlearning: Thriving Businesses in Times of Rapid Change
Dr David Gruen AO
Australian Statistician
Tuesday 11 October 2022
Abstract
The speech outlines the ABS’ approach to data security in light of the recent Optus data breach. It discusses the ABS approach to paying for private sector data. It also touches on the new data sources that have become available to the ABS and how they are being used to generate new statistical insights as well as the growth in integrated data assets.
Introduction
Thank you, Ian, for inviting me today to open the 2022 Melbourne Business Analytics Conference. It’s great we can get together in-person again – to have analytics academics, executives and practitioners sharing insights into business, technology and mathematics.
In opening the conference, I will talk about some of the new data sources that have become available to the Australian Bureau of Statistics and how we are using them to generate new statistical insights. I will also talk about the growth in integrated data assets.
But before doing that, in light of the Optus data breach, I want to spend some time explaining the ABS’ approach to data security. As an organisation whose primary focus is data and statistics, you would expect us to take data security very seriously – and we do. Let me elaborate.
As many would remember, the 2016 Census was subject to a series of distributed denial-of-service (DDoS) attacks. There was no data breach, but the Census digital service was taken offline for over 40 hours as a precaution, which inconvenienced those trying to complete their Census online. In the aftermath of these DDoS attacks, several enquiries made a multitude of recommendations for how to do things better, all of which were implemented for the 2021 Census.
One of the key recommendations was to work closely with the Australian Government’s Australian Cyber Security Centre (ACSC). This included having ACSC involved in all relevant procurement decisions for the 2021 Census digital service and, in the leadup to the Census, working with us and our delivery partners, PwC and Amazon Web Services, to simulate several DDoS attacks, as well as enabling ethical hackers to try to compromise our systems, and running scenarios which played out in detail how we would respond if something went wrong – so we had well-developed protocols and were fully prepared.
In the event, everything ran smoothly even though there were slightly less than one billion cyber attacks on our Census digital system on Census day, 10 August 2021 (‘billion’ is not a misprint). [1]
On people’s names and addresses collected in the Census, we committed more than a year before the Census to retaining names for no more than 18 months and addresses for no more than 36 months, after which they will be destroyed.[2]
Of course, the five-yearly Census is just one of the things we do. But the detailed careful approach to data security we apply to the Census is a fundamental element in all our areas of responsibility. We have a publicly available Data Strategy prepared by our Chief Data Officer, updated annually, with a prominent role for security and privacy protections for the data we hold.
We haven’t eliminated all cyber risks – unfortunately that’s an impossible goal. But we take data security very seriously and we have a multi-faceted approach that requires the continual attention of senior management. Data security also requires monitoring and updating as the passage of time and the march of technology makes new things possible but also reveals new cyber risks.
New data sources
Let me turn now to some of the new data sources the ABS is accessing to provide statistical information that was previously unavailable. Data from these new sources are all by-products of the digital revolution. And all these new datasets are examples of ‘big data’ – large, and usually complex, datasets from new sources.
Let me begin with Single Touch Payroll (STP). The Australian Taxation Office (ATO) receives payroll information from employers with STP enabled payroll software each time the employer runs their payroll. Given the extensive coverage of the STP system, these data cover more than 10 million employees. The arrival of the pandemic in early 2020 made access to this rich vein of near real-time information an urgent priority. The ATO expedited access, and the ABS began receiving these data in early April 2020.[3]
From then on, each week, the ATO provides job and wage data from the STP system to the ABS with which we produce a new publication: Weekly Payroll Jobs and Wages.
In many ways, access to Single Touch Payroll data taught us new ways of doing things. Given the scale and complexity of these data, it made sense to ingest and analyse them using cloud computing services rather than our existing computer systems. And that is the new model for accessing public and private sector big data assets to generate new statistical insights. Let me describe a couple of them.
In October 2021, we began releasing a new monthly indicator of business turnover, based on Business Activity Statements (BASs) submitted to the ATO. Again, to give you a sense of scale, there are about 130,000 BAS remitters from whom we gather information for this new monthly indicator. This should be compared to our comparable survey, Quarterly Business Indicators, which is based on a sample of 16,000 businesses.
In February this year, we released a second monthly indicator which provides a measure of household consumption. This indicator is based on about 800 million bank transactions by households each month (with these data provided by Australia’s major banks in aggregated, de-identified form). Household consumption accounts for about half of GDP, so there is considerable value in having an accurate measure of it. The existing monthly measure of household consumption comes from the Retail Trade Survey, based on a sample of around 3,400 businesses.
The Retail Trade Survey covers less than 30% of household consumption, whereas the new measure, based on banks’ transactions data, covers 68% of household consumption, so that is a substantial step up.
Discussion of this new monthly measure of household consumption provides an opportunity to explain the ABS’ approach to seeking access to datasets from private-sector providers. As already discussed, in seeking such access, we are mindful of the critical need to keep the data we access secure and with privacy preserved – meaning data about individuals or individual firms cannot be identified. But what about paying for these data?
Our position is that we will pay extraction costs for private-sector datasets, but not for the datasets themselves. Our aim is not to compete with the private sector, but rather to generate public value from private (and public) sector datasets. We can generate substantial public value by creating statistics from private-sector datasets and publishing these statistics free for anyone to access via our website.
To my mind, it is part of the social responsibility of firms to provide their data free of charge to support the public sector to improve public policy and service delivery – subject to appropriate security and privacy constraints and reimbursement for extraction costs.
I do acknowledge an important exception to the principle I have just outlined. The exception applies to private-sector firms whose business is wholly or primarily concerned with collecting or aggregating data and on-selling it. For these firms, providing their data freely to the public sector could put their business models at risk.
Returning to our new products, last month we began publishing a monthly indicator of CPI inflation, particularly relevant given the current inflationary environment. This was made possible because of our access to digital data sources, including scanner data from supermarkets and web-scraped prices data.
A significant benefit of using existing data, collected for other purposes, to generate new statistical indicators, is that there is no need to put a new survey in the field, which places an unavoidable burden on respondents to the survey.[4]
The digital revolution also offers new ways to reduce the burden on our existing survey respondents.
We are working with businesses, accountants, bookkeepers, and accounting software companies to co-design a new reporting application that links with the accounting software that businesses currently use. In the future, a business will have the option to extract and pre-fill their financial data directly into an ABS web application from their accounting software package (such as Xero or MYOB). By removing the need for businesses to manually collate information and key it into our survey, we aim to reduce the typical time for a small business to complete their survey from an hour to about 5 minutes. As part of this new initiative, we will provide tailored reports back to businesses to help them understand their performance relative to similar businesses.
This new approach will be implemented from March 2023 for eligible businesses completing their Quarterly Business Indicators Survey. Other surveys will be added during 2023 and 2024.
Integrated data assets
The other area I want to address briefly is integrated data assets. We have a growing number of integrated data assets being used across the public sector to support research, policy development and analysis. The ABS hosts a large business-centred integrated data asset called BLADE (Business Longitudinal Analysis Data Environment) and a large person-centred integrated data asset called MADIP (Multi-Agency Data Integration Project). These data assets have been developed and enhanced over many years by the collaborative efforts of many people across many Commonwealth agencies and departments.
The pandemic supercharged progress in the development of these integrated data assets and the realisation of the benefits they can provide in the service of better public policy. Let me describe some of this progress.
Our earlier standard practice was to update the underlying data in both BLADE and MADIP once a year. But as these data assets have matured, processes have been streamlined and key enabling infrastructure (particularly the ABS DataLab) has been moved to the cloud. This enhances security and makes possible more sophisticated data analysis. It also means both BLADE and MADIP can now be updated much more frequently.
There have been many additions to these integrated data assets. Let me describe a few of them.
We have introduced a new quarterly updated Business Locations dataset to BLADE enabling detailed geospatial economic analysis. Along with quarterly updates to BAS data, this allowed the ABS to provide the National Recovery and Resilience Agency (NRRA) with detailed geographic business counts and economic information for the flood devastated areas of New South Wales and Queensland.
To support Treasury track economic recovery from the pandemic, the Labour Market Tracker Project integrated job-related data, including STP, JobKeeper and JobSeeker data to both BLADE and MADIP. Datasets are updated fortnightly, monthly and quarterly as they became available, to enable up-to-date monitoring of the labour market and the economy.
And for my final example, data from the Australian Immunisation Register are being linked to MADIP each week. Provisional Death Registrations data are being linked and updated monthly. These data are being used by the Department of Health to generate insights for the Australian COVID-19 Vaccine and Treatment Strategy, including which groups in the community have lower vaccine uptake – and hence where to focus effort to raise that uptake – and by state health departments and primary health networks.
In conclusion, the digital revolution has opened many opportunities to enhance the role that data can play in supporting better public policy and more efficient service delivery. We at the ABS are excited by these opportunities and determined to make the most of them.
Footnotes
[1] Correction: In total, there were nearly 1 billion attempted cyber-attacks, during the Census digital service period of operation from 23 July to 1 October 2021.
[2] See the ABS response to the Census Privacy Impact Assessment published on our website on 21 July 2020. For those interested in the results from the Census, rather than the data protection aspects, the first release of Census data has been available since late June and the second release, focussing on employment, occupations and work patterns, will be available on our website tomorrow.
[3] We are extremely grateful to the ATO for this access, particularly given how busy they were at the time delivering the JobKeeper package amongst other activities. Data on 10 million employees from STP allows us to produce detailed geospatial analysis (or to disaggregate across other dimensions) which is not possible using the 50,000 or so individuals from whom we collect data in the monthly Labour Force Survey. This coverage and detail are benefits of administrative ‘big’ data sources.
[4] On the other hand, a drawback of big data is that it is often not representative of the whole population, in contrast to a well-designed survey. For example, STP-enabled businesses are unlikely to be representative of all businesses.