2024 Australian Government Data Forum: Keynote address

Building Data Capability in the Australian Public Service

Dr David Gruen AO
Australian Statistician
Thursday 14 November 2024

Introduction

I have lived in and around Canberra on and off for much of my life. I thank the Traditional Custodians of this land who have cared for it over millennia. I pay my respects to their Elders and acknowledge and welcome other members of the Aboriginal and Torres Strait Islander community who are attending today.

Thank you for the opportunity to speak about building data capability in the Australian Public Service.

In my talk today, I’ll give an overview of the initiatives to build that capability as part of the Data Profession. I will also talk about the ways in which the data available to public-sector analysts and researchers has improved radically over the past several years. 

I will also take a quick moment to celebrate the Australian Public Service Data Awards that were presented yesterday evening.

The APS Data Profession

Let me start with initiatives under the banner of the APS Data Profession. I’ll provide an update on the information I provided to the first Australian Government Data Forum, held 18 months ago in May 2023.

Some of this may sound familiar to those of you who were in the audience for that talk. But there has been substantial progress over the past 18 months, which keeps things interesting!

To recap briefly, an Independent Review of the Australian Public Service was conducted in 2019. It is known as the Thodey Review, after its lead author, David Thodey. The review made a strong case for enhancing the use of data to support public policy formulation and better service delivery. The Thodey Review recommended the APS access new data sources for public-policy purposes, make wider use of integrated data assets to rigorously develop and improve policies, and update legislation and infrastructure to enable data to flow securely between agencies. It also recommended the APS launch linked Data and Digital Professions to build data and digital expertise.

All these things have come to pass.

Along with Data and Digital Professions, the Australian Public Service Commission also launched the Human Resources Profession – indeed, it was the first Profession launched – in 2019.

Building on the success of these first three Professions, three new Professions have been announced this year to raise APS capability in: 

  • procurement and contract management, 
  • evaluation, and 
  • complex project management. 

The Data Profession is sharing lessons learnt with these new professions. We plan to establish enduring partnerships where natural linkages arise. So far, we have a partnership with the Evaluation Profession in recruitment by including an assessment of evaluation skills as part of Data Graduate Recruitment. The Evaluation Profession is also helping us strengthen the evaluation content in Data Profession course offerings.

Now in its fifth year, the Data Profession continues to deliver initiatives co-designed with partner agencies across the APS. Co-design has compelling benefits relative to other possible models: it brings the resources of several agencies to the development of new initiatives; ensures Data Profession offerings remain relevant to the people who use them; and avoids duplicating effort across the service.

The Data Profession has always seen its role as lifting data capability across the APS. With that in mind, the Data Profession has focused on programs: 

  • providing entry-level pathways into the APS for people with data skills, 
  • developing training courses to raise data literacy and leadership from graduates through to SES,
  • providing a Members’ Community Platform with a wide range of communities of practice, events, a place to advertise public-sector data jobs, learning resources for all levels, and robust discussion threads,
  • defining and refining data capabilities and roles to assess data capability, respond to capability gaps, and create a common language with which to describe data job roles,
  • encouraging diversity of people in data roles, and
  • clarifying career pathways and guiding data skills development.

Since 2021, graduates have been able to enter the APS via a dedicated data graduate stream, through a centralised recruitment process led by the ABS. Interest in this data graduate stream has grown over the years. For the 2021 intake, 11 agencies were involved, and 65 data graduates were placed across the APS. Over the four years, 2021-24, about 850 data graduates have taken up roles across over 40 APS agencies. 

For the 2025 intake, 43 agencies are participating, with projected placement of nearly 300 data graduates across the APS. In a sign of the level of interest among prospective graduates, after placing these nearly 300 across the service, there are a further 220 prospective graduates on a merit list. [1]

The ABS has also collaborated with Treasury’s Australian Centre for Evaluation, as part of data graduate recruitment, to identify prospective data graduates with evaluation skills, to support the newly established Evaluation Profession. 

This stream has just started – thus far, 4 Evaluation Stream graduates have been matched across 3 agencies.

The Data Profession continues to offer a range of training courses, for data graduates through to SES. For data graduates, there are ‘Introduction to data in government’ modules, which have been provided over the past few years. These training modules cover data in the APS, trust in government, evidence-based decision making, evaluation, data storytelling and visualisation, and the APS Data Profession. These modules are now being developed into a set of self-paced online modules that will be available to all staff via APS Learn.

The SES Data Leadership Course is for SES who want to bolster their data literacy but are not data professionals themselves. It has continued to be delivered in partnership with ANU and the APS Academy during 2024. Since its 2022 inception, 11 courses have been delivered to a total of 192 SES across 26 agencies. The course is now being reviewed with the expectation that a new iteration will be rolled out in the first half of 2025. 

Building on the SES course, a similar course has been developed for EL2s. This course is delivered virtually to allow it to be accessed anywhere, with a pilot delivered to 50 EL2s in May this year. Following the pilot, the course has been delivered to 75 participants across 2 cohorts.

The Data and Digital Professions have initiated a project to develop a suite of training materials on Artificial Intelligence to complement existing offerings.

In a sign of the extent of interest, 16 agencies have signed up to contribute to the working group that is developing these materials.

The Data Profession Members’ Community Platform (MCP) has been running a little over two years and continues to grow. It is extremely encouraging, indeed remarkable, that there are now over 11,000 public service members from Commonwealth and State/Territory governments using the platform. The MCP provides a space to connect with other data professionals through communities of practice and discussion boards. You can also access job opportunities, register for events and much more.

Some of the things you can find on the MCP are:

  • the active Artificial Intelligence and Machine Learning community of practice, hosted jointly with the Digital Profession, with almost 1,000 members,
  • a document describing 50 curated data learning offerings, which has been accessed over 700 times, and
  • 17 episodes of my ‘In-Conversation’ video series with guests for whom data has played an important role in their careers, including:
    • Nobel Prize winning astrophysicist, Professor Brian Schmidt,
    • Assistant Minister Andrew Leigh, 
    • host of the Finance Report on the nightly ABC TV News, Alan Kohler, and 
    • Victor Dominello and Ian Oppermann, who were responsible for the widely lauded digital transformation of the NSW Government. 

To get a sense of the level of data maturity in the APS, the Department of Finance has coordinated an inaugural data maturity assessment. 92 agencies participated, each completing a detailed questionnaire across a series of aspects of data maturity to provide a measure of the extent of data maturity across the service. 

The point is to set a baseline, identify priority areas for improvement and support more effective data-driven analysis across the service.

Let me turn now to the Data Capability Framework, first published in October 2022, and now being refreshed. The Framework outlines 26 specific capabilities associated with working with data in the APS. Each capability has three proficiency levels: foundational, intermediate, and advanced.

Agencies are using the framework predominately to assess capability at an agency or individual level to identify any gaps and guide career development, organisational planning and strategy, and alignment with their agency’s own capability framework. 

Eighteen months ago, I spoke about the use of the Data Capability Framework to assess levels of data capability across the ABS. 

This year we’ve built on that work to run another ABS Capability Survey – this time focused not just on data, but also digital and enabling capabilities.

The Capability Survey once again used a self-assessment approach, asking ABS staff members to rate their levels of proficiency against 37 capability areas. 

Encouragingly, we’ve seen a 40% increase in the number of ABS staff members who filled in the survey compared to last time. [2] The survey offers value at both the individual and organisational level. Individuals who participated received a summary of their responses to help their development and career planning. Organisationally, we’ve been able to better understand the self-assessed capabilities of our workforce. 

Figure 1: ABS Data Capability Survey

Figure 1 shows the results from the ABS Data Capability Survey

Figure 1 shows the results from the ABS Data Capability Survey.

Figure 1 shows the results, as well as a selection of results for the software and programming languages we asked about. For each capability, the lightest colour shows the proportion of people who reported that they had no skills, with progressively darker colours for the proportions who reported foundation, intermediate, and advanced levels of capability. 

This year we’ve also sought to better understand our workforce capability needs, by developing a supplementary survey which asked Directors to identify their teams’ top priority areas for development in the coming year. 

These two surveys together provide a view of our capability strengths and needs and have enabled us to identify the top enterprise development priorities we will advance in 2024-25, as well as to inform divisional workforce planning.

Data science capability continues to emerge as a key focus. The Capability Survey results have been used to develop an enterprise roadmap for data science capability including people, platforms, and process. The survey results enable the ABS to get a shared understanding of data science roles and capabilities. 

Data capability is an important capability for many APS agencies. The Capability Survey ensures agencies to understand their current and future capability strengths and needs and allows us to plan and adapt. 

The final capability initiative I’ll highlight today is the APS Data Job Role Personas. The initial 6 personas were published in December 2022 and 6 more have been added this year. The new data job personas are: 

  • Data Steward, 
  • Data Project Manager, 
  • Data Governance Officer, 
  • Metadata Specialist, 
  • Data Architect, and
  • Data Translator. 

The suite of (now) 12 Job Role Personas reflects the breadth of work undertaken across data roles in the APS. This work is also designed to complement the Data Capability Framework.

Data Job Roles establish a common language with which to understand baseline data skills, competencies, and the requirements of different data roles. The Data Job Roles can be used by agencies and individuals in the APS to:

  • understand capabilities needed for specific data roles, 
  • identify skills development needs for career progression, 
  • assess capabilities to assist with performance reviews, 
  • provide common language for data role advertisements, and 
  • support human resource and workforce planning. 

As I said earlier, the preferred model is for Data Profession initiatives to be co-designed with other agencies. In the case of Data Job Roles, I want to record my appreciation to the ATO who led the work to develop and extend this initiative. I also want to thank the Department of Health and Aged Care, the Australian Institute of Health and Welfare, and Home Affairs who also led Data Profession projects this year. 

The Data Profession team has also taken the opportunity to share some of the training materials with Badan Pusat Statistik, the Indonesian equivalent of the ABS.

Improving the Evidence Base for Public Policy

Having talked in some detail about the initiatives in place to raise data capability, I want to provide examples of how far we have come with the data now available within the public service to provide public policy insights and improve service delivery. While there are many elements to this, I will focus on the rise of ‘big data’ and the increasing use of integrated data assets.

On the rise of big data, I’ll give one example to illustrate what is now possible. In my talk to the Australian Government Data Forum last year, I spoke about a joint ABS/RBA study on rents – a topic worth revisiting as it remains a contemporary issue.

The joint study uses a dataset which provides data on rents for about 600,000 rental properties across both regional and capital cities in Australia, updated monthly. With that much data, it is possible to provide extremely detailed information on developments in the rental market.

 

Figure 2: Rent price indices*, March 2020 = 100

A two panel line graph of rent price indices by SA3 in greater Sydney and greater Melbourne with the index equal to 100 in March 2020

A two panel line graph of rent price indices by SA3 in greater Sydney and greater Melbourne with the index equal to 100 in March 2020. The X-axis represents the month, ranging from June 2018 to September 2024. The Y-axis represents the level of the index. Each line is coloured according to the distance of that SA3 from the CBD, warmer colours (like red) mean that the SA3 is further from the CBD, while cooler colours (like purple) mean that the SA3 is close to the CBD (ranging from 0 to 80km).

Figure 2 shows rental prices over the past seven years by distance from the CBD in Sydney and Melbourne. The broad outlines of the price developments in Australia’s two largest cities are remarkably similar. Among many other things(!), the arrival of COVID-19 in Australia in March 2020 represented a huge location-specific shock which led to a huge fall in demand for rental properties near the CBD.

As a result, there were big falls in market rents close to the CBD but not further out in the suburbs. The near-to-CBD rental price falls began to reverse in 2021, and the contrast over this period with the outer suburbs is striking indeed.

More recently, with the unwinding of the COVID-induced location-specific shock, the behaviour of rents in the inner and outer suburbs of Sydney and Melbourne has been broadly similar.

This example shows how big data provides a level of detail that is not available any other way. It enables analysts to understand what has been going on with the rental market not just on average but in different segments of the market at different times. 

A curious analyst could use these data to throw light on many public-policy-relevant questions about the rental market. (In my opinion, curiosity is an underappreciated attribute for success!)

BLADE and PLIDA

On developments with integrated data assets, I will talk specifically about the Business Longitudinal Analysis Data Environment (BLADE) and Person-Level Integrated Data Asset (PLIDA). [3]

The critical developments that led to the first versions of both BLADE and PLIDA occurred in 2015. [4] They are now the two largest and most extensive integrated data assets in Australia.

Figure 3: Business Longitudinal Analysis Data Environment (BLADE) Datasets

This figure outlines all the datasets included in the Business Longitudinal Analysis Data Environment (BLADE)

This figure outlines all the datasets included in the Business Longitudinal Analysis Data Environment (BLADE). BLADE is an economic data tool combining tax, trade and intellectual property data with information from ABS surveys to provide a better understanding of the Australian economy and businesses performance over time.

Figure 4: Person Level Integrated Data Asset (PLIDA) datasets

his figure outlines the all the datasets included in the Person-Level Integrated Data Asset (PLIDA)

This figure outlines the all the datasets included in the Person-Level Integrated Data Asset (PLIDA). PLIDA is a secure data asset combining information on health, education, government payments, income and taxation, employment, and population demographics (including the Census) over time. It provides whole-of-life insights about various population groups in Australia, such as the interactions between their characteristics, use of services like healthcare and education, and outcomes like improved health and employment.

Figures 3 and 4 show the impressive array of datasets that now make up these two integrated data assets. BLADE includes surveys on a wide range of business characteristics, data on business income and tax, on exports and imports, insolvency, and employment conditions. [5] 

PLIDA includes information from the Census, tax return data, data on social security recipients, migrants, and on health, education, and disability. 

There are further datasets that are in the process of being added to these integrated data assets. They are marked in the figures with a ‘+ sign’.

These integrated data assets therefore provide analysts with powerful tools to shed light on public policy problems across multiple dimensions. 

Encouragingly, we are seeing broadening support for data integration translating into broadening financial support. In June, the government announced a $3.6 million investment in the Criminal Justice Data Asset following a successful pilot phase.

More recently, the Australian Research Data Commons agreed to provide the ABS with $1.8 million over the next four years to improve capacity and skills across the social science community to capitalise on existing and emerging technical and methodological advances. This is an opportunity for the ABS to enhance collaboration between social science university researchers, government agencies, the not-for-profit sector and communities. This project will result in enhancements to PLIDA documentation and ease of use, as well as guidelines for users of government administrative data on Indigenous people.

Framework for the Governance of Indigenous Data

Lastly, I want to take the opportunity to provide an update on the progress being made on the Framework for the Governance of Indigenous Data. Earlier this year, the Secretaries Digital and Data Committee approved the Framework being introduced across the APS. 

The National Indigenous Australians Agency led the development of the Framework which was was designed in partnership between the APS and Indigenous leaders. The Framework aims to provide Aboriginal and Torres Strait Islander people greater agency over how their data are governed within the APS, so government-held data better reflects their priorities and aspirations. It recognises better outcomes are achieved if Aboriginal and Torres Strait Islander people have a genuine say in matters affecting them, including use of data to inform government policy making.

The Framework consists of a set of actions informed by 4 guidelines:

  • Partner with Aboriginal and Torres Strait Islander people,
  • Build data related capabilities,
  • Provide knowledge of data assets, and
  • Build an inclusive data system

While the official implementation period begins in January 2025, many departments and agencies have begun implementation where they can. 

In the ABS, we have revitalised our advisory Round Table on Aboriginal and Torres Strait Islander Statistics. We now remunerate Aboriginal and Torres Strait Islander community members and ensure more senior ABS representation at meetings; ensure the updates to the classifications of language and occupation better reflect Indigenous languages and occupations; and we are working with the university research sector to improve guidelines for users of government administrative data on Indigenous people.

Conclusion

To conclude, I hope I have convinced you that there is a lot going on when it comes to data in the public service – a lot in terms of raising data capability, and a lot of improvement in the range and scale of data available for public policy analysis and service delivery.

Finally, let me take the opportunity to recommend you join the Data Profession Member Community Platform if you haven’t already done so – to collaborate with your peers, access events, learning resources and job opportunities to support your data career in the APS.

Thank you.

Footnotes

[1] There are ten streams through which graduates can enter the APS – a generalist stream and nine specialist streams: Accounting and Finance, Data, Digital, Economics, Human Resources, Indigenous, Intelligence, Legal and STEM. Prospective graduates to the APS can apply in more than one stream. The data stream continues to generate huge interest, with almost 2,000 prospective graduates applying for the 2025 intake, more than to any other specialist stream.

[2] Some might muse that if you can’t get ABS staff members to complete surveys, who will?

[3] Datasets are ‘integrated’ when they are linked together so that analysts can study several aspects of individuals’ (or individual businesses’) behaviour together. The unit records are linked together in such a way that records from different datasets (for example, health and tax records) are identified as being for the same person (or same business). This is done via a spine that is common across the linked datasets (see https://www.abs.gov.au/about/data-services/data-integration/person-linkage-spine for further information). The individual records are de-identified so that privacy is preserved, and the identity of individuals (or individual businesses) is not revealed. It is incumbent on the hosts of these data assets, in this case the ABS, to ensure they are secure, with well-developed protocols to ensure the private information of individuals and businesses is protected and is not compromised.

[4] See Data linkage and integration to improve the evidence base for public policy: lessons from Australia for a history of the development of BLADE and PLIDA.

[5] Given the earlier discussion on rents, note that rents data from the ATO is now integrated into PLIDA, as Figure 4 shows.

Back to top of the page