The Australian Bureau of Statistics (ABS) collects data across Australia’s public and business sectors. This article attempts to derive further insights from this data which can inform decisions across the economy, offering additional benefits to the Australian public.
Investigations into the Business Characteristics Survey (BCS) were initiated to seek a method for relating individual data items to observable business outcomes. Outcomes relating to profitability changes over time were considered and represented by a profitability score. The intention was to identify characteristics common to lower-score businesses in one sample, and to estimate the performance of the statistical model by testing it against another sample.
To do this, a random forest classifier was employed; a machine learning (ML) method to classify businesses with lower and higher profitability. The random forest classifier uses decision trees to iteratively distinguish the two profitability groups from each other based on other characteristics of those businesses. Each decision of a component decision tree identifies a feature which best splits the two classes. The random forest classifier reflects the majority decisions across all decision trees to estimate the splitting approach which best distinguishes businesses belonging to the two profitability classes.
This classifier was tasked with identifying BCS features that are associated with businesses regarded as either in a “higher profitability score” group or a distinct “lower profitability score” group. The trained classifier, once developed to make such distinctions reliably, was scrutinised to understand its inner workings and uncover which aspects of the data contribute to these decisions.
This article provides a concise overview of the findings documented in the Technical Report linked here: “Producing Official Statistics from Linked Data - Technical Report."