Innovation and data
Address to the Australian Public Sector Innovation Show
Dr David Gruen1
Australian Statistician
By Videoconference – 18 June 2020
INTRODUCTION – INNOVATING TO INFORM DECISIONS
I begin by acknowledging the Ngunnawal people, the Traditional Custodians of the land in Canberra from which I’m joining you today.
I pay my respects to their Elders past and present and extend that respect to all Aboriginal and Torres Strait Islanders present in this videoconference.
And I thank the Public Sector Network for inviting me to speak today.
The arrival of COVID-19 has seen a flowering of innovation right across the public sector.
While it’s taken different forms as departments and agencies deal with major disruption to business as usual, all of us have found ourselves looking for new, creative and effective ways to respond.
I’ve no doubt this has been the case for my former colleagues at Treasury and PM&C, and our friends at the ATO who’ve collaborated with us on one of our new statistical products.
At the ABS, the spread of COVID-19 has meant keeping our key data flowing, while adapting to satisfy the hunger for up-to-date information during the pandemic.
Maintaining the continuity of key ABS series such as the quarterly Consumer Price Index (CPI), the National Accounts and the monthly Labour Force Survey is crucial at any time, but especially in a crisis.
These series show us how Australia is faring in good times and bad – and even in a pandemic. The last few months have seen some extraordinary developments:
· the largest ever drop in visitor arrivals in March,
· a record-breaking rise in food retailing, household goods, and other retailing in March, followed by an even bigger fall in retail in April;
· an unprecedented drop in employment of nearly 600,000 people between March and April, and a fall in hours worked of over 9 per cent; and, just a fortnight ago,
· GDP contracting by 0.3 per cent in the March quarter, with growth slowing to 1.4 per cent through the year, the slowest rate since Australia was in the midst of the Global Financial Crisis.2
With this level of disruption, even maintaining data continuity requires innovation.
Since late February we’ve also developed and released a range of new products to support decision-making around COVID-19. These include:
· additional information and analysis around our releases, such as short-term visitor arrivals to Australia and ‘Spotlights’ on key dimensions of our Labour Force data;
· new interactive employment distribution maps with age and industry dimensions;
· interactive maps showing small area modelling of chronic health conditions; and
· preliminary estimates of retail and international merchandise trade.
And to meet the demands for near ‘real time’ information to measure the social and economic effects of the pandemic, we have introduced two new rapid-response surveys – one of businesses and the other of households.
Today, I’d like to take you through some of the recent ABS innovations that put us in a strong position to respond so quickly to COVID-19.
I’ll discuss the key factors in their success, and take a look at emerging areas where we hope to see the fruits of further innovation over coming years.
INNOVATION IN A CHANGING DATA LANDSCAPE
Long before we swapped family-Zoom-catch-ups for the traditional Easter gathering, the digital revolution had already transformed many aspects of our lives.
There are few sectors where the pace of innovation or change has been greater than data and analytics. The following statistics emphasise this point:
· Based on the latest available information, Australians downloaded on fixed services over 5.3 million Terabytes (TB) of data over three months, a figure that’s more than doubled in the last four years.3
· Indeed, thanks to our internet searches, messages, uploads and the internet of things, by 2025 it’s been estimated that the ‘global datasphere’ will grow to ‘175 zettabytes’ of data (that’s 1.75x1023 bytes of information – certainly worthy of the term ‘Big Data’).4
And while much of this new data represents a tidal wave of consumer information for our private sector counterparts, it also drives expectations about the accessibility, timeliness and utility of public sector data.
Analysts and researchers (myself having been one of them for most of my life before I came to the ABS late last year!) are always keen to have more access to more data more quickly.
In seeking to inform Australia’s important decisions, a changing information landscape translates into a need for innovation to be embedded in everything we do.
I’ll look first at two of our main economic indicators; Labour Force statistics and the CPI.
NEW INSIGHTS - LABOUR FORCE STATISTICS
The Labour Force Survey
The ABS has run the Australian Labour Force Survey since 1960.5 Most recently, the ABS has been using behavioural insights to raise the number of respondents registering to use our on-line forms. Using randomised controlled trials, we’ve tested the impact of:
· simplifying our introductory letters,
· adding additional reminders, and
· changing how we address letters to households.
Chart 1 shows that in our trials, the first two measures added more than 13 percentage points to the share of respondents who opted in to our web-based form. Subsequently, a separate trial found an additional 6 percentage points could be added, by changing the addressee printed on the envelope to make it more personalised.
These small, simple changes deliver significant gains, and are translating into faster and more efficient data collection, and a more convenient experience for our respondents – which makes a material difference when the Labour Force survey collects data from 25,000 households (approximately 50,000 people) each month.
But it takes time to conduct and analyse the results from such a survey; faster publication of more frequent data on the labour market was clearly desirable to shed light on the evolving disruption of COVID-19.
New COVID-19 labour market products
For some time, the ABS and the ATO have been discussing the opportunity to extract higher-frequency information from the Single Touch Payroll, or STP, system. This is a system used by employing businesses to report to the ATO information about wages, superannuation and tax payments for their employees.6
When I was first drafting these remarks, the idea of the ABS extracting added value from these data was already being explored. But the arrival of the coronavirus meant that providing public access to this rich vein of detailed near-real-time information became an urgent priority.
This joint ATO-ABS effort has evolved into the Weekly Payroll Jobs and Wages in Australia publication. These new data (Chart 2) show that between the week ending 14 March 2020 (when Australia recorded its 100th confirmed COVID-19 case) and the week ending 2 May 2020:
· payroll jobs decreased by 7.3%, and
· total wages paid decreased by 5.4%.7
Chart 3 shows changes in payroll jobs by industry from 14 March to 11 April, and for the following period from 11 April to 2 May.
The Accommodation and food services industry saw the largest initial falls in the number of payroll jobs (down by 33.3 per cent), followed by Arts and recreation services (down by 22.9 per cent). However, payroll jobs in these industries also showed the strongest recovery in the following three-week period. The Professional, scientific and technical services industry saw only a small initial fall in payroll jobs (down by 2.6 per cent in the first period), but then had the largest falls in the following period (down by 9.2 per cent).
Together with the Household and Business Impacts of COVID-19 surveys, Weekly Payroll Jobs and Wages in Australia (Cat. no. 6160.0.55.001) will continue to provide up-to-date labour market information.
It will allow analysts to see detailed evidence of labour market changes before they appear in the more comprehensive Labour Force Survey results.
The Labour Account
As well as innovating in the way we collect and report data, the ABS has been looking at ways to provide new insights into the labour market. One of these is the Labour Account – one of only four in the world, and the most comprehensive.
The Labour Account presents consistent information on labour input by industry in Australia, and includes labour market activity outside the scope of the surveys. It provides an opportunity to significantly improve key aggregate inputs into the National Accounts – particularly hours worked.
The ABS recently extended the Labour Account back 25 years, to enable the data to be used to improve the measurement of productivity. New estimates of public and private sector jobs were also included, to support improved sectoral analysis.
The next steps include developing options for state and territory level labour accounts.
Some of these examples of innovation in Labour Force statistics have been evolutionary, and some revolutionary. Innovative methods and approaches, coupled with a strong focus on collaboration with the ATO and other partners, have been critical to our success.
The ABS will keep making improvements, in as close to real-time as possible, to ensure our labour market statistics continue to keep pace with our changing economy and society.
NEW DATA SOURCES AND METHODS – THE CPI
Another key ABS output at the forefront of innovation is the Australian Consumer Price Index, or CPI.
The way consumers purchase goods and services has changed dramatically in recent years. To give a topical example, the amount spent on online shopping has risen 235 per cent over the past five years.
Not only have consumers changed their behaviour, but retailers have also changed the way they seek out and connect with their customers, and have significantly increased the array of consumer data they collect along the way.
These developments have provided fertile ground for innovation in the CPI, both in the collection of price data and in the methods used to compile the index.
In 2012 the ABS began obtaining transactions (scanner) data from a number of large retailers. These data represent around 25% of the weighted prices which make up the CPI.
By March 2014, the first big step forward came when these scanner data replaced their manually collected counterparts.
In 2017, the ABS published the CPI using new methods which allowed the use of a greater amount of the information within the scanner data to enhance the CPI’s accuracy.
At the same time, the ABS implemented annual reweighting of the CPI, to better reflect the purchasing patterns of Australian households and reduce the upwards bias which creeps into indexes which aren’t reweighted frequently.
More recently, with the continued increase of online shopping, the ABS has been making use of web scraping techniques to collect prices. This innovation, along with maximising the use of scanner data, has allowed the ABS to increase the number of prices collected for the CPI each quarter to well over a million – a significant jump from the 100,000 items sampled in earlier times. The next innovation being explored is the use of machine learning to code automatically some elements of the CPI data.
Once again, collaboration was critical to the success of these innovations. The ABS collaborated with academic experts, and with other national statistical organisations.8
The CPI and Labour Force Survey are probably well-known to most of you. Now I’d like to turn to some innovations in areas of the ABS with which you may be less familiar.
NEW TECHNOLOGY – AUTOMATED IMAGE RECOGNITION IN THE ABS ADDRESS REGISTER
Underpinning the Census and our household collections is the ABS Address Register. Every quarter, around 500,000 ‘use of address’ changes become apparent by comparisons with administrative data such as Building Approvals, Residential Property Prices, and data from the Australian Electoral Commission and Australia Post.9
Ideally, all of these potential changes would be quality assured, but resource limitations meant that in the past, only around 34,000 addresses could be confirmed by desktop review each quarter. These traditional reviews use aerial imagery and other research tools to make assessments of address use and dwelling structure.
In collaboration with the CSIRO and Data 61, the ABS Address Register Section has developed a machine learning model called Automated Image Recognition (AIR). This allows automatic checking of many more addresses based purely on aerial imagery.
Rather than the 34,000 address checks previously completed per quarter, the ABS can now quality assure five times that number using the same resources (Chart 4). The new approach results in a much better register, which means saving time and money during data collection.
This new technology has just gone into production and is an example of international best practice. The Address Register team are now working with the United Nations Economic Commission for Europe High Level Group for Modernising Official Statistics to share learnings from this project with other National Statistical Organisations.
NEW TECHNOLOGY – GEO-STATISTICS
As well as reducing costs where possible, one of the key challenges all data custodians face is how to make optimal use of existing datasets. There is great potential to use geospatial data and Geographical Information Systems (GIS) to investigate complex questions or issues. In some cases, this means adding new data items about the built environment to existing social or economic collections – at the ABS we call these data Geo-statistics.
The Health Research Institute at the University of Canberra was interested in exploring how access to services might affect the health and wellbeing of respondents to the National Health Survey (or NHS). Funded by the Commonwealth Department of Health, the ABS Geospatial Solutions and Health Sections partnered to add new geospatial information to the 2017-18 NHS data – another example where collaboration was the cornerstone of data innovation.
Using GIS technology, the ABS generated areas which represent a reasonable walking distance from a respondent’s home. Next, the locations of nearby supermarkets and fast food outlets were identified and used to produce basic counts of accessibility to these services. Public Open Space areas within 400 metres were also added into the picture. These layers of data are pictured in Chart 5.
Finally, the data was aggregated in ways that preserved confidentiality but provided insights into connections between services available in the built environments and the socio-economic characteristics of the neighbourhood. The geo-statistics generated in this project were made available within the National Health Survey publication, and via the ABS DataLab.10
ACCESSING AND PROTECTING ABS DATA
This example brings us to an overarching point – the ABS is keen to innovate, but this can only happen with strict regard for the legislative, privacy, transparency and security settings in which we operate.
Any change we make must respect the individuals who provide their information to us, and the privacy of their data.
Having strong data confidentialisation controls has always been important to the ABS, but this is becoming more important as we integrate and provide access to greater amounts of administrative data.
All projects seeking to access microdata are assessed against stringent criteria – projects must be in the public interest and in accordance with the legislation of the relevant agencies. In addition, all users are legally obliged to use data responsibly for approved purposes, comply with the conditions of access, and maintain data confidentiality.11
FUTURE INNOVATION – GLIDE
The appetite for integrated datasets is growing rapidly as policy departments strive to access the types of data that assist in making decisions that have more complex and multifaceted impacts on society and the economy.
The data required to address these issues can be:
· of high volume,
· made up of multiple types of data,
· derived from multiple sources, and
· structured in a way that doesn’t readily facilitate analysis.
The links between the data at different levels may also be complex.
This set of circumstances has been under investigation in recent years at the ABS. Since 2014, the Machine Intelligence and Novel Data Sources (MINDS) section has conducted research and development work into ways of representing, integrating and analysing these kinds of complex data.
The feasibility and value of combining multisource data has been demonstrated through a prototype system which we’ve named GLIDE.
GLIDE stands for 'Graphically Linked Information Discovery Environment'. GLIDE stores information in a way that makes it simpler to deal with the many-to-many relationships that occur when integrating multiple data sources.
The underlying data can be manipulated or represented in different ways, as desired by the analyst. This means faster and easier interrogation of data is possible, enabling exploration to discover new insights.
GLIDE is currently being evaluated for experimental use within the ABS, for example in comparing a range of external data sources in ABS data validation process.
Given the continuing and evolving information needs of policy makers responding to the COVID-19 crisis, the next steps for GLIDE might include pilot projects to analyse the impacts of the pandemic on Australian businesses.
Testing will also commence on combining administrative data sets in GLIDE to determine the effects of COVID-19 restrictions on the labour market. This could include analysis of firm entry and exit, job flows, multiple job holders, and transitions into and out of the workforce.
The road out of the COVID-19 crisis will require many complex decisions that draw heavily on data. In providing this data, the ABS has sought not to waste the crisis – new data sources and new opportunities for collaboration are being actively explored. As the data landscape around us continues to evolve, there’s probably never been a more exciting time to be an analyst, or even the Australian Statistician!
Thank you.
1 I am grateful to Kristen Stone, Bjorn Jarvis, Andrew Tomadini, Stephen Cohen, Debbie Goodwin, Kimberley Seats and Meg Dixon-Child for their help in preparing these remarks.
2 For further details see Cat. no. 8501.0 - Retail Trade, Australia, April 2020; Cat. No. 3401.0 - Overseas Arrivals and Departures, Australia, March 2020; and Cat.no. 5206.0 - Australian National Accounts: National Income, Expenditure and Product, March 2020.
3 5.3 million TB were downloaded via NBN and Non-NBN fixed services in the three months ended June 2019 Source: ACCC Internet Activity Report – June 2019 (Published October 2019). 2 049 553 TB were downloaded in the three months ended June 2016 Source: ABS 8153.0 – Internet Activity Australia, Dec 2016.
4 International Data Corporation (IDC), Data Age 2025: The Digitization of the World, (sponsored by Seagate)
5 The ABS was one of the early adopters of telephone interviewing in 1996, computer assisted interviewing in 2003, and online forms in 2013.
6 The STP system includes about 99 per cent of large and medium sized businesses, and about 70 per cent of small businesses.
7 A ‘payroll job’ is recorded when the employee has received a payment in the reference period through STP-enabled software and thereby reported to the ATO. If no payment is recorded through STP, there is no payroll job recorded in that period, even if the employee retains a relationship with their employer. Where an employee is paid other than weekly, the established payment pattern is used to identify jobs in weeks outside the payment week. In the labour market, detailed definitions matter. Considerable statistical innovation is needed to extract meaningful insights from the STP data. For further explanation, see the Calendarisation and Imputation sections of the Explanatory Notes accompanying the release of 6160.0.55.001 - Weekly Payroll Jobs and Wages in Australia.
8 Professor Jan de Haan, an international price index expert from Statistics Netherlands, was instrumental in this work. Professor Kevin Fox, from the University of New South Wales, also provided independent review and expertise along with Statistics Canada, Statistics New Zealand, and the Ottawa Group, an International working group on price indexes formed in 1994 for the sharing of experience and research on ‘crucial problems in measuring price change’.
9 For example, in a typical quarter there might be 100,000 new addresses identified and 400,000 changes noted such as the demolition, subdivision, conversion or completion of a building at a given address.
10 For example, we might be able to find out whether, or to what extent, access to fast food is correlated with other information captured on the survey, such as Body Mass Index or Socio-Economic Disadvantage.
11 For examples of integrated datasets hosted by the ABS, and further details of the ‘Five Safes’ approach to assessing and managing the appropriate use of microdata and the risk of disclosure, see The Promise of Data in Government.
Charts accompanying address to the Australian Public Sector Innovation Show