Address to the Institute of Public Administration Australia (IPAA), ACT

Dr David Gruen*

Wednesday 11 March 2020

The Promise of Data in Government


Introduction

I would like to begin by acknowledging the Ngunnawal people, the Traditional Custodians of this land. I would also like to pay my respects to the Elders past and present and extend that respect to Aboriginal and Torres Strait Islanders present here today.

Thank you to IPAA for providing the opportunity for me to speak today.

In 1996, the four largest publicly listed companies in the world were General Electric, Royal Dutch Shell, Coca-Cola and the Japanese telecommunications company, Nippon Telegraph and Telephone.

Last year, the four largest were Microsoft, Apple, Amazon and Alphabet (previously Google).

This wholesale changing of the guard among the world’s largest companies is just one manifestation of the data and digital revolution that has transformed many aspects of our lives over the past quarter century or so.

And it is a stark reminder of the power of data.

Today, I don’t want to talk about the commercial opportunities opened up by data, but rather about the promise of data in government.

This is a broad topic and I will limit myself to three elements of it. First, I will talk about the promise of integrated public-sector data assets. Secondly, I will talk about the importance of maintaining the community’s trust in the safe handling of public-sector data. And finally, I will provide some comments about the upcoming Data Availability and Transparency Bill.

The Public Sector Data Landscape

The view that data is valuable is now an overwhelmingly accepted proposition – and not just in the private sector. The potential for data to improve outcomes also applies with considerable force in the public sector.

While public servants and governments aren’t focused on making a profit, we are focused on improving the efficiency of services and enhancing outcomes for the citizens and businesses we serve.

It seems clear that the potential value of data increases many-fold when individual data sources are brought together to enable public-policy issues to be examined from a range of different perspectives. For example, combining the health, education and employment circumstances of people can teach us a lot more than examining each individual characteristic on its own. Similarly, joining together data on the performance of businesses with that on the skills of their employees generates added insight.

Underpinning the increasing use of combined datasets is a fundamental principle – that these data should be available only to trusted users in a safe and secure environment for purposes that benefit the Australian community. I will have more to say on that topic shortly.

There are a growing number of integrated data assets being used across the public sector. Prominent examples in the Commonwealth include the ATO’s A-Life; AIHW’s NIHSI asset; and the Treasury and ATO’s Longitudinal Linked Employer Employee Database – each of which have been brought together to enable research, policy development and analysis1.

I want to spend some time telling you about two of the most significant integrated data assets: BLADE and MADIP. While I am not particularly thrilled by either of these acronyms, they should be thought of as the ‘business’ integrated data asset, and the ‘person’ integrated data asset.2 The ABS hosts these two data assets – a role we are permitted to play because of our expertise, commitment to privacy, security, transparency and maintaining trust. The development of MADIP and BLADE has been made possible by the combined efforts of people across many Commonwealth agencies and departments.

BLADE is best described as a series of integrated, linked longitudinal datasets. BLADE was originally established in 2014 as a joint project between the ABS and the then Department of Industry and Science to examine employment dynamics by integrating select ABS business surveys with data from the ATO.

The next critical step in BLADE’s development was the arrival of the Data Integration Partnership for Australia, or DIPA (apologies for another acronym!). DIPA, coordinated out of the Department of the Prime Minister and Cabinet, is a $130.8 million investment over three years, 2017-18 to 2019-20, to substantially improve the use and value of the government’s data assets. The arrival of DIPA in July 2017 provided funding to the ABS to expand and improve both BLADE and MADIP.

Under DIPA, BLADE has been expanded to include more data from the ATO, more ABS survey data and more administrative data from government departments, including merchandise imports and exports data from the Department of Home Affairs and data on government programs from the Department of Industry, Science, Energy and Resources. In late 2019, twelve years of agriculture census and survey data was integrated into BLADE for the first time.

Let me turn now to MADIP. With funding from DIPA, a secure and enduring people-centred data asset has been built. It combines information on healthcare, education, government payments, personal income tax, and population demographics (including from the Census).

As with BLADE, the longitudinal nature of MADIP is critical because it allows changes and patterns in the Australian population to be better understood and analysed across time.

Six Commonwealth agencies are working together on MADIP: the ABS; the ATO; the Department of Education, Skills and Employment; the Department of Health, the Department of Social Services, and Services Australia.

The first project linking employer and employee information from BLADE and MADIP is also currently underway.

Ultimately, the goal is to develop an integrated data asset that combines MADIP and BLADE, as well as other important Commonwealth and State government data, to underpin research across economic, social, geospatial and environmental areas.

With more data being integrated, securely available and used by an increasing number of users, there has been a large rise in the number of topics being tackled using the integrated data assets.

Researchers and policy analysts have used BLADE to learn more about the determinants of innovation; productivity; research and development; understanding business performance; and entrepreneurship trends.

Academics have used MADIP to study income inequality in Australia including the equity of out-of-pocket costs for Medicare services and prescription medicines, and to better understand Australia’s tax, transfer (welfare) and health systems.3

Let me mention four specific examples of the benefits that have flowed from the increased use of integrated data assets.

The first is a study by researchers at the RBA who used data from BLADE to demonstrate the effectiveness of a business investment tax break introduced in the early stages of the global financial crisis to stimulate business investment.4 The benefit of such a study is that it can be used to inform subsequent policy development in circumstances where such interventions are deemed desirable.

The second is a study which analysed Pharmaceutical Benefits Scheme (PBS) data to identify adverse events associated with medicines. One hundred and twenty two medicines already known to be associated with heart failure were identified. More importantly, five new medicines were also identified, and further investigation has led to regulatory action being taken for one of these.5

The third example involves allocating public funding among non-government schools on the basis of a school community’s capacity to contribute to their school’s operating costs. The current system allocates funding based on the socio-economic status of the neighbourhood of each school. However, by using integrated data from MADIP, it has become possible to allocate school funding based on the income of those families with students studying at the school, regardless of where the families live. This is a more accurate way to determine families’ capacity to pay, and therefore to allocate funding among schools. The policy change associated with this new approach is being phased in from this year.

The fourth set of examples come from Treasury, which has invested significantly in its capacity to analyse administrative microdata. First, Treasury has used BLADE to understand the drivers of Australia’s productivity slowdown, with a particular focus on the role of declining market dynamism.6 Second, as I mentioned earlier, Treasury has collaborated across government agencies to build a de-identified longitudinal linked employer-employee database, which has already yielded novel insights into why wage growth has been unexpectedly weak over recent years.7 Finally, BLADE will provide the necessary ingredients over coming months for Treasury to gauge the potential disruption to Australian firms from their reliance on inputs sourced from countries that have been severely affected by COVID-19.

Access to these integrated data assets is made possible via the ABS’s DataLab, and it is worth spending a little time explaining the evolution of the DataLab.

The ABS has been operating a secure microdata environment since 2002. The original DataLab was developed to provide secure access to ABS Confidentialised Unit Record Files (CURFs). As more sophisticated microdata products were developed, including integrated data products from BLADE and MADIP, the DataLab has become the primary access method for those products.

The ABS DataLab was initially onsite, meaning that users had to travel to ABS offices to access data using the DataLab. Given the obvious limitations of an onsite DataLab, a virtual DataLab was developed – and became operational in October 2016. This provides access to trusted users without the need for them to travel to ABS offices.

The introduction of the ABS Virtual DataLab, combined with funding from DIPA, has seen a huge increase in demand from researchers to access data virtually. The number of microdata users accessing data through the Virtual DataLab has steadily increased from about 50 in 2016 to almost 900 users in 2019. These users are a mix of State and Commonwealth public servants (around two-thirds), and academics and analysts from universities and public policy institutes (around one-third). There is every indication that demand to access data in the virtual DataLab will continue to rise.

The ABS plans to modernise the virtual DataLab further by moving it to the Cloud. Moving to the Cloud will enable more trusted data users to access data; provide them with greater analytical processing power; enable larger and more complex datasets to be made available; and importantly, enable the ABS DataLab to provide a secure data sharing and analysis environment for other parts of the public sector, to support the draft legislation I will discuss shortly.

But before doing so, let me turn to the critical issue of maintaining the community’s trust in the use of data.

Maintaining the community's trust

The ABS works hard to earn the trust of the community that we will keep their information safe. As an independent institution with a history of almost 115 years focused on safely capturing, storing and releasing data, it’s in the DNA of the ABS to carefully consider the safety of data in all aspects of our business.

Let me spend some time on the key elements of maintaining the community’s trust. These key elements are the legislation under which the ABS operates, our commitment to maintaining privacy, our approach to data security, and our efforts to be transparent and engage with stakeholders and the public.

The Census and Statistics Act (1905) ensures the ABS does not release information that is likely to lead to the identification of an individual. The Statistics Determination (2018) ensures ABS microdata are unidentified when accessed by trusted users. And the ABS Act (1975) establishes the position of the Australian Statistician and the independence of the role.

In addition to the legislation, the ABS – along with several other Australian government agencies and our peer national statistical organisations, including the Office of National Statistics (UK) and Statistics New Zealand – uses the Five Safes Framework to assess and manage the appropriate use of microdata and the risk of disclosure.

The Five Safes Framework takes a multi-dimensional approach to ensuring microdata is securely accessed for the right purpose, at the same time protecting the confidentiality of people and business data.

Specific questions are posed by the framework to help assess and describe each risk aspect (or safe) in a qualitative way. Appropriate controls are then put in place – not just on the data itself, but as importantly on the way the data are accessed. By doing this, we can facilitate safe data access, while doing our best to avoid over-regulation.

The framework has five elements: Safe People; Safe Projects; Safe Settings; Safe Data; and Safe Outputs.

All projects seeking to access microdata are assessed against stringent criteria – projects must be in the public interest and in accordance with the legislation of the relevant agencies. In addition, all users who get access to the data are legally obliged to use it responsibly for approved purposes, comply with the conditions of access, and maintain the confidentiality of the data.

Having explained the philosophy behind the Five Safes Framework, and the key principles that motivate it, I would encourage those of you with a keen interest in more detail on how the ABS safeguards integrated public data assets to read the non-technical explanation of how the Five Safes Framework is implemented, available on the ABS website.8

The ABS has always had a focus on the security and privacy of the public’s data, and has made substantial investments in security in recent years, improving our governance and oversight, and working with the Australian Cyber Security Centre to ensure the robustness of our arrangements.

On privacy, the ABS handles personal information with the care required to adhere to or exceed the requirements of the Privacy Act 1988 and the Census and Statistics Act.

The ABS has a range of measures in place to ensure that the information provided is safeguarded. These measures include strong encryption of data, restricted access on a need-to-know basis, physical security of premises and data lock-ups, monitoring access and use, and regular audits. And we keep the public informed on the use of data, including publishing Privacy Impact Assessments on the ABS website.9

As Australian Statistician, the trust of the community in the ABS is something I take very seriously – it’s the cornerstone of being able to produce the high-quality statistics on which many of you here rely – and it’s something that needs to be upheld right across the public sector for the benefit of the people we serve.

Data Availability and Transparency Bill

Let me now turn to an upcoming piece of legislation that has the potential to significantly enhance the effective use of public sector data – the Data Availability and Transparency Bill, which also has a strong focus on maintaining public trust in government use of data.

The journey to the Data Availability and Transparency Bill began with the Productivity Commission’s 2017 Data Availability and Use report, which looked at the benefits to society of increasing the availability, and use, of public-sector data.10

The Data Availability and Use report recommended, among other things:

      “the creation of a data sharing and release structure that indicates to all data custodians a strong and clear cultural shift towards better data use” (PC Report, p.2).
In response to the report, the Government set up the Office of the National Data Commissioner (ONDC), and appointed Deb Anton as Interim National Data Commissioner to run the Office. The role of the office, at least in its first incarnation, is to develop legislation to enable a simpler and more efficient data sharing framework within government.11

The ONDC has achieved a lot in a relatively short time. This has included developing legislation based on widespread public consultation; issuing a number of Discussion Papers; and undertaking Privacy Impact Assessments on the proposed data sharing legislative reforms.12

In partnership with the ABS, the ONDC has developed data sharing principles as a core part of the legislation, based on the five safes framework I outlined earlier.

The ABS recently had the opportunity to review the draft legislation, along with the public service more broadly. The ONDC is now taking on board this feedback ahead of releasing a draft for public consultation.

Given that the legislation is not yet available for public consultation, I think it is premature for me to provide a detailed commentary on it. But I do want to provide the broad outlines of my thinking, and that of the ABS.

The draft legislation provides an alternative pathway to share data held by government agencies where it is not currently possible or practical to do so, provided certain safeguards are met.

Use of the proposed legislation by Commonwealth agencies is enabling and optional. Under the proposed legislation, there is no authority to compel custodians of public sector data (that is, agencies that hold and are responsible for their slices of public-sector data) to share data. Further, if data custodians decide not to share data, this cannot be overturned by the National Data Commissioner.

I see this upcoming legislation as complementary to ABS legislation, enabling effective use of data across government and supporting my view that data should be as available as possible to trusted users. There is enormous public value to be unlocked by making data as available as possible to trusted users, but to do so comes with a crucial caveat. That caveat is that a high standard of appropriate safeguards, such as those I have outlined today, must be in place. These include a commitment to maintaining privacy; highly developed data security; and transparency and engagement with stakeholders and the public on the use of data. And data sharing can only be safe when data users have the skills and knowledge to care appropriately for the data they’re using.

The ABS has a big role to play in building this capability – to ensure these high standards are applied in the collection, use, analysis, sharing and dissemination of data right across the public sector.

Conclusion

Let me sum up.

In recent years, there has been a great deal of work done, and significant financial investment, in building public-sector data assets, particularly integrated assets, and in improving access to public-sector data for trusted users.

As I have argued today, this work and financial investment has enormous potential to unlock public value associated with the use of these data assets. But to realise this public value, it is crucial to build and maintain the community’s trust. Building community trust in the use of data by government is a long-term pursuit because years of good practice can be quickly undone.

The ABS will continue to champion the safe use of data; support the further development of secure data access for trusted users, and the new data sharing legislation; and help build the highest professional data standards across the public service.

Thank you.

* I am extremely grateful to Marcel van Kints and Celia Moss for their help preparing these remarks, and to them and other colleagues for insightful comments on an earlier draft, particularly Deb Anton, Teresa Dickinson, Dan Andrews and Alan Finkel.

1. The Australian Longitudinal Individuals File (ALife) is designed to provide approved researchers with access to de-identified unit record tax and superannuation data in a safe and secure way. See https://alife-research.app/info/overview for more information. The Australian Institute of Health and Welfare (AIHW) is working with the Australian Department of Health and state and territory health authorities to create the National Integrated Health Services Information (NIHSI) Analysis Asset (AA). The NIHSI AA will contain de-identified data from 2010-11 onward. L-LEED is linking business and employee data to understand the interaction between businesses and employees. The prototype is led by the Treasury and is being built by the Australian Taxation Office and Data61.
2. BLADE is the Business Longitudinal Analysis Data Environment, and MADIP is the Multi-Agency Data Integration Project.
3. See https://www.abs.gov.au/websitedbs/D3310114.nsf/home/Statistical+Data+Integration+-+BLADE+Research+Projects, and https://www.abs.gov.au/websitedbs/D3310114.nsf/home/Statistical+Data+Integration+-+MADIP+Research+Projects.
4. https://www.rba.gov.au/publications/rdp/2018/2018-07.html
5. https://www1.health.gov.au/internet/main/publishing.nsf/Content/Data-Integration-Partnership-Australia
6. https://treasury.gov.au/publication/p2019-37418a
7. https://treasury.gov.au/publication/p2019-396067, https://treasury.gov.au/publication/p2019-37418b
8. https://www.abs.gov.au/ausstats/abs@.nsf/Latestproducts/1160.0Main%20Features4Aug%202017
9. https://www.abs.gov.au/websitedbs/D3310114.nsf/home/ABS+Privacy+Impact+Assessments
10. https://www.pc.gov.au/inquiries/completed/data-access/report/data-access.pdf
11. https://ministers.pmc.gov.au/keenan/2018/government-response-productivity-commission-inquiry-data-availability-and-use
12. See https://www.datacommissioner.gov.au/sites/default/files/2019-09/Data%20Sharing%20and%20Release%20Legislative%20Reforms%20Discussion%20Paper%20-%20Accessibility.pdf, and https://www.datacommissioner.gov.au/sites/default/files/2019-09/Data%20Sharing%20and%20Release%20Legislative%20Reforms%20Discussion%20Paper%20-%20Accessibility.pdf.