A Rao-Yu model for small area estimation of labour force statistics

This is a paper that was brought to the Methodology Advisory Committee on 10th August 2023.

Released
30/10/2023

Sean Buttsworth, Rachel Barker, and Mark Zhang, Methodology Division, Australian Bureau of Statistics
 

Methodology Advisory Committee

The ABS Methodology Advisory Committee (MAC) is an expert advisory group of statisticians and data scientists drawn mainly from, but not restricted to, universities across Australia and New Zealand.

The function of the MAC is to provide expert advice to the Chief Methodologist on selected methodological issues relevant to the production of official statistics, including questionnaire development, survey design, data collection, data linkage, estimation, time series analysis, confidentialisation and the use of new and emerging data sources. The MAC is a key mechanism used by the ABS to ensure outputs are based on sound and objective statistical principles.

The Committee generally meets two or three times each year to discuss papers on specific issues. The papers can relate to work at different stages of the methodology solution cycle: 

  • during initial thinking around a statistical and/or data science method to meet new or emerging needs; 
  • as investigations proceed or develop the more complex aspects of a proposed method; 
  • as methods are finalised and circulated for public comment; or 
  • as part of ongoing reviews of methods already implemented in ABS work.

In providing advice the Committee considers aspects such as: 

  • the statistical validity of the approach proposed; 
  • similar statistical or data science problems, arising in other fields, where work done may provide the basis for improved solutions to the given problem; and 
  • the implications of the statistical or data science methods proposed for the valid use of the outputs, and in particular the inferences that might be drawn from the resulting data.

Purpose of this paper

We would like to seek input from MAC on the suitability of the proposed solution for estimating labour force statistics at a small area level and quantifying their uncertainty, as well as any suggestions for improving the quality of these estimates.

ABS context

  • In the 2022-23 federal budget the ABS was given funding for enhancing regional labour market statistics to provide a better understanding of regional labour market developments.
  • The ABS has undertaken to provide improved SA4-level monthly estimates of key labour force statistics in 23/24. We aim to have a productionised system in place by February 2024 to enable monthly SA4 modelled estimates to be released in conjunction with the monthly direct estimates at national and state level.
  • By the end of June 2024 we are aiming to provide even finer-level (e.g. SA2, SA3,  LGA) modelled estimates at a frequency that yields sufficiently reliable estimates. It is also a goal to provide SA4 modelled estimates disaggregated into age by gender demographic groups.

Problem specification

  • The Labour Force Survey is designed primarily to provide national estimates, with the secondary design objective of producing state and territory estimates. Direct estimates at Statistical Area Level 4 (SA4) are also provided, notwithstanding that many of these SA4 estimates have high levels of sampling error due to small sample sizes.  
  • Small area modelling techniques may be employed to produce SA4-level estimates with greater stability. However, application of the cross-sectional cell-level mixed models typically employed by the ABS is unlikely to be either appropriate or optimal given the rotating panel sample design of the Labour Force Survey.
  • Hence time series small area models (Rao-Yu models in particular) have been used to properly account for, and take advantage of, the temporal correlations in the labour force data. However, such models are substantially more complicated to apply than simple cross-sectional models.
  • This paper describes our application of Rao-Yu models to estimate SA4-level monthly estimates of labour force statistics.
  • Mean squared error estimators are readily available for the modelled small area count estimates of levels. However, movement estimates and estimates of unemployment rates are also desired, and mean squared errors of these are not straightforward since they depend on the correlations between modelled estimates. It is expected the use of a parametric bootstrap may be necessary. 
  • While it is not the focus of this paper, we also face the challenge of modelling at finer levels of geography and demographics.

Key findings, issues, and challenges

  • The Rao-Yu model evaluated seems, in most cases, conceptually appropriate, and empirically produces plausible estimates with greater accuracy than direct estimation methods.
  • Dealing with COVID and other real-world changes presented difficulties and the model will need to be managed in the case of future shocks.
  • The estimation of AR1 coefficients for NSW and VIC employment models was stopped at upper bounds of 0.98, indicating non-stationarity problems. The dynamic model of Fay and Diallo (2012) may be a better alternative in these cases. 
  • The random area effect variance estimates and random area by time effect variance estimates were small but significant for unemployment. For employment both random effect variances were small and statistically non-significant.  
  • Uncertainty estimation for movements and rates will be required.
  • For a large number of time points and areas, fitting the Rao-Yu model, as implemented in the SAE2 R package, is computationally demanding. Hence we have not explored Australia-level models. The computational demands will also need to be considered when monthly production is implemented in 2024, when we look at bootstrapping for uncertainty estimation, and when we explore the feasibility of modelled estimates for finer-level geographies.  

Questions for Methodology Advisory Committee

  1. Does MAC have any feedback on the proposed methodology for modelling, small area estimation and uncertainty estimation?
  2. Are there better alternatives to dealing with the time-varying regression coefficient due to COVID impacts?
  3. Do you have any alternative modelling approaches in general?
  4. Are there improvements or further work that should be pursued?
  5. Any thoughts on the value of a multivariate model and on obtaining finer-level estimates would be appreciated.

1. Introduction

1.1. Context

The Labour Force Survey provides monthly data on the Australian labour market. It is designed primarily to provide reliable national estimates, with a secondary objective to provide state and territory data. Regional estimates – Statistical Area Level 4 (SA4) – are also published from the Labour Force sample, notwithstanding that it is not a design objective. Many of these SA4-level direct estimates suffer from high sampling error and volatility due to small sample sizes and sample rotation.

In the 2022-23 federal budget the ABS was given funding to improve the quality of regional labour force data. The immediate priority is to produce more accurate SA4-level monthly estimates of employment, unemployment and not-in-the-labour-force (NILF) counts, along with unemployment and participation rates.  Preliminary experimental estimates, along with an information paper, were published in July 2023.  A longer-term goal is to provide fit-for-purpose estimates at finer levels of geography – SA2, SA3 and LGA – and by age group and gender. The frequency at which these finer estimates can be viably provided is yet to be determined.

The ABS has been working to produce modelled monthly SA4-level estimates using an area-level multilevel time series model. This model, and our implementation of it, is described in section 2 of this paper, including a rationale for our choice of the Rao-Yu model (Rao and Yu, 1994). In section 3 we provide some results from the model fitting and evaluation of the small area estimates.  Section 4 outlines some directions for further work that we are planning or contemplating. We begin in the next section by outlining a history of small area investigations conducted in the ABS with the Labour Force Survey. We then give a brief overview of small area estimation methods that have been applied to labour force data in the literature.  

1.2. Labour force small area estimation in the ABS

Over the years various methods have been investigated for modelling small domain estimates from the ABS Labour Force Survey, with a particular focus on unemployment. To date, no modelled estimates have been published.

In 2007 logit mixed models were developed based on those described in Saei and Chambers (2003).  These were applied independently across time to produce small area estimates of labour force status for local government areas from the LF sample in the months of August, from the years 2001-2006.

Wooton (2010) investigated a multinomial logit mixed model with category-specific random effects for obtaining LF status variables for local government areas. This was an extension of the work done by Molina et al. (2007). The method was applied to and evaluated on two months of data – August 2001 and August 2006.

In 2013 a feasibility study looked into obtaining monthly SA2-level estimates using repeated cross-sectional logit mixed models. However, there were insufficient resources within the Labour Force section at the time for this to be pursued. It was also unclear whether unemployment benefits – the key explanatory variable – could be reliably obtained on an ongoing basis. Subsequently, concerns were raised about the likely volatility of a time series of such estimates. 

In 2016 the ABS began investigating the use of cross-sectional random effects models for deriving annual small area SA2-level estimates. However, time and resources ran out before the project could deliver its outcomes. 

In 2016-17 a body of work was undertaken to look at small area models that incorporated temporal correlations from the LF survey. The goal was to obtain more reliable SA4-level estimates at monthly or quarterly frequencies.  Various area-level models were examined, including structural time series models and bivariate SUTSE models (making use of the unemployment benefits series). Rao-Yu models were also examined and the work was documented in a 2017 MAC paper. One of the key conclusions was that a seasonal version of the Rao-Yu model would be valuable.

In 2018 the University of Wollongong was contracted to fit Bayesian spatio-temporal models to obtain annual modelled SA2-level LF estimates. However, these were not published due to concerns about the plausibility of the time series of estimates and lack of available resources from the ABS Labour Force Branch for ongoing production.

1.3. Labour force small area estimation in the literature

In this section we highlight some key literature relevant to the modelling of labour force small area estimates. We do not attempt to be comprehensive.

Rao and Yu (1994) developed a time series extension to the area-level Fay-Herriot model (1979). The Rao-Yu model incorporates area random effects and time random effects that are modelled as an AR(1) process. Importantly, it allows for correlated sampling errors. Variations of this model have added seasonality and/or replaced the AR(1) assumption with a random walk (Datta et al., 1999).

Chambers and Saei (2003) also developed theory for both unit- and area-level temporal multilevel models as part of the 2004 EURAREA project. In each of the variations of these models, the repeated samples were assumed to be independent at each time point.

Unit-level time series small area modelling for rotating panel surveys was the topic of a 2009 PhD thesis by Kari Nissenen. Simulations indicated the efficacy of the model; however, this was for the case of a continuous target variable and for estimation of levels rather than movements.

In a 2014 paper, Vizcaino, Lombardia and Morales used a multinomial model with random time and area effects to produce estimates from the Spanish Labour Force Survey of Galicia. The authors note that these methods were “derived for simple random sampling and do not take into account potential labour force survey sampling design effects.”

The Office for National Statistics has published official modelled estimates of unemployment counts and rates for the 406 local area districts in the United Kingdom. These annual estimates are based on logit mixed models. The UK Labour Force Survey is used with the models fitted to independent samples of labour force responses pooled over each year.  

As well as multilevel models, structural time series (STS) models have also been applied to labour force small area estimation. The U.S. Bureau of Labor Statistics obtains modelled state estimates from the Current Population Survey. This is essentially via state-level univariate STS models, but with a borrowing of strength over states by the inclusion of benchmark constraints to ensure additivity with broad level direct estimates.

Statistics Netherlands uses a multivariate structural time series model to obtain estimates for six age group by gender domains from the Dutch Labour Force Survey. Van den Brakel and Krieg (2016) examined common factor STS models and bivariate STS models, suggesting that further improvements in the accuracy of the domain estimates could be achieved by modelling correlated disturbance terms.

While there has been much research and interest in the production of labour force small area estimates, it is notable that, to our knowledge, the only small area LF estimates currently published by an NSO are those by the ONS, Statistics Netherlands and the U.S. Bureau of Labour Statistics.

2. Modelling

2.1. Model in brief

A Rao-Yu model has been chosen and our application of it is described in this section. A justification of this approach over other possible modelling approaches is given in section 2.2. Further details of the data sources and modelling considerations are given in sections 2.3 to 2.8.

The key target variables are employment, unemployment and not-in-the-labour-force (NILF) counts along with the unemployment rate. It is reasonable to assume that the original LF series are multiplicative in nature with variances increasing with level. Hence a log transform is first applied for linearity and additivity and with the expectation that the transformed variables will better satisfy model assumptions.  Employment and unemployment counts are modelled independently. NILF has not been modelled directly but is derived by subtraction of the employment and unemployment modelled estimates from the LF total in-scope population.  Similarly, the unemployment rate is derived from the modelled employment and unemployment counts. For employment and unemployment the model is described below.

\(\theta^{*}_{dt} = \textbf{x}\boldsymbol'_{dt}\boldsymbol\beta + v_d + u_{dt} \\ y^{*}_{dt} = \textbf{x}\boldsymbol'_{dt}\boldsymbol\beta + v_d + u_{dt} + e_{dt} \)

where:

\(\theta^{*}_{dt}\) is the log of the population count \(\theta_{dt}\) where \(d = 1,2,...,D\) refers to the SA4 and \(t = 1,2,...,T \) to the time.

\(y^{*}_{dt}\) is the log of the direct sample estimate of \(\theta_{dt}\)

\(\textbf{x}\boldsymbol'_{dt}\) is a vector of SA4-specific covariates, some of which may vary with \(t\)

\(v_d\) is a random SA4 effect with \(v_d \sim NID(0,\sigma^{2}_{v})\)

\(u_{dt}\) is a random SA4-by-time effect modelled as a first-order autoregressive process:
\(u_{dt} = \rho{u}_{d,t-1} \ + \epsilon_{dt}, \; |\rho|<1, \; \epsilon_{dt}\sim NID(0, \sigma^2_\epsilon)\)

\(\textbf{e}^{\boldsymbol*}_d = (e^{*}_{d1}, e^{*}_{d2}, ..., e^{*}_{dT})\) is a vector of logged sampling errors for SA4 \(d\) which is assumed to follow a \(N(\boldsymbol{0}, \boldsymbol\psi_d)\) distribution, where the covariance matrix \(\boldsymbol\psi_d\) is known.  

Further detail regarding the sampling error model that defines the known covariance matrix of sampling errors is given in section 2.4. Details of the explanatory variables are given in section 2.3.

The models were fitted to the following state groups:

  • NSW with ACT
  • VIC with TAS
  • QLD
  • SA
  • WA
  • NT with outback SA4s from other states – 315, 406 and 508

This was due to the small number of SA4s in the smaller states of ACT, NT and TAS. Models were fitted at state group levels in preference to an Australia level, (i) because many of the model parameters are expected to be state-specific, and (ii) because of the computational burden of fitting an Australia-level model.  

The time span for the employment model was Jan 2020 to Sep 2022, corresponding to the availability of reliable STP job count data. For the unemployment model the time span was Jan 2016 to Sep 2022.

2.2. Rationale for choice of area-level multilevel model

In this section we give reasons why we chose the Rao-Yu model and why we chose it over the following possible alternatives:

  1. Time series models of Chambers and Saei (2013)
  2. Repeated cross-sectional models
  3. Structural time series models
  4. Unit level time series models

The rotating panel design of the ABS Labour Force survey presents both a need to take into account the complex sample design and an opportunity to make use of temporal correlations along with fixed predictors. The use of direct estimates as the dependent variables and the explicit inclusion of the sampling errors and their covariance matrix are key features of the Rao-Yu model. This means that, so long as we do a good job of specifying an appropriate sampling error covariance structure, the complexities of the sample design will be allowed for and the variability due to sampling removed in our model predictions. This includes the variability induced by sample rotation.

The Rao-Yu model borrows strength temporally by using a time-invariant regression coefficient, with residual temporal variation captured in the area-by-time random effects. The temporal random effects, being modelled as an AR1 process, capture the correlations of LF status within time points, as well as any autocorrelation at lag 1.

A practical reason for the choice of the Rao-Yu model was the availability of software for fitting these models and obtaining analytical mean squared error estimates. This is implemented in the R package SAE2 (Fay and Diallo, 2019).  This package also allows for the future possibility of a multivariate Rao-Yu model. While there are many small area R packages now available, we are only aware of SAE2 as allowing for temporal correlations in the sampling error covariance matrix.

The main disadvantage of the Rao-Yu model is its sensitivity to changes in the relationship between the target variable and the auxiliary variable because of the time-invariant regression model assumption. Examples include administrative changes to unemployment benefit recipients data from the DSS – the start of Jobactive in July 2015, COVID-19 from March 2020 with the associated new Jobseeker measurement, and Workforce Australia from July 2022. Vigilance is required to be aware of such changes and intervention to the auxiliary data or model needed to reduce damage to the quality of the predictions (see section 2.7).

Unit- and area-level time series models were also developed by Chambers and Saei (2013). These have the potential advantage of allowing for time-varying regression coefficients, and software is available in SAS for fitting these models. However, these models were not pursued because they assume independent repeated samples at each time point.

The ABS has historically focussed on unit-level cross-sectional logistic mixed models for small area estimation. These were considered for the LF case largely because they are simple, known and productionised. While these models might be modified to borrow strength over time by the inclusion of fixed time effects, this will not be efficient for a long time series such as the LF survey, as noted by Nissenen (2009, 22). The complex LF sample design would also need to be modelled explicitly and it is not clear that this could be done effectively or parsimoniously. Finally, empirical comparisons indicated that the simple repeated application of the cross-sectional models led to more unstable movement estimates than the Rao-Yu model.

Structural time series models are a powerful alternative to multilevel models for small area estimation and are more flexible than the multilevel approach.  In particular, bivariate SUTSE models that utilise an auxiliary time series were found to be effective for small area estimation of unemployment, as documented in a 2018 MAC paper by Zhang, Tam and van den Brakel. The key reason that these models were not pursued in this instance is the requirement for employment (and unemployment rate) estimates. The auxiliary information available for a SUTSE model of employment, STP job count data, is only available for a short time span – not really sufficient for STS modelling. While using a multilevel model for employment and a bivariate SUTSE model for unemployment may be possible, there are practical reasons why this is not desirable.

A final modelling possibility that has been considered is that of unit-level time series models for rotating panel designs. By this we mean a LF status variable linked deterministically over time (updated monthly), linked (probabilistically) at unit level to a time series of auxiliary data (updated monthly). Theory for these small area estimation models has been developed in the PhD by Nissenen (2009). It is expected that a unit-level model may be more powerful than its area-level counterpart. The ABS already creates a monthly Longitudinal Labour Force dataset that links the LF common sample from month to month. However, the following issues have caused us to favour the Rao-Yu area-level alternative:

  1. The resource burden and timeliness of a monthly probabilistic linkage of LF sample to auxiliary time series data.
  2. The need to address linkage uncertainty and attrition errors.
  3. The fact that the Nissenen paper develops theory and evaluates results for continuous variables and for the current time period estimates, rather than for binary variables and movement estimates. Hence the theory would need some further development and evaluation of the efficacy of the models.
  4. The need to code up the theory.

2.3. Data sources and explanatory variables

The key data sources are:

  1. ABS Labour Force Survey,
  2. Department of Social Security unemployment benefit recipient counts
  3. Single Touch Payroll job counts data provided to the ABS by the Australian Tax Office, and
  4. Labour Force Estimated Resident Population

The ABS Labour Force Survey (LFS) is based on a monthly sample of around 25,000 households corresponding to about 50,000 selected persons. The sample design is complex, with stratification by state and multi-stage sampling used within strata. Most significantly for our modelling, the LFS has a rotating panel design with 7/8 of the sample common from month to month. The survey estimates are constructed by calibrated weighting with composite estimation used such that the current month’s estimate is a weighted mean of the last seven months.

For our modelling the LFS provides direct (i.e. survey-weighted) estimates of employment, unemployment and not-in-the-labour-force (NILF) counts at SA4 level. Because unemployment is relatively rare, very occasionally in some months there is no responding unemployed sample in a small area, and in these cases a simple donor impute has been applied based on the previous month’s estimates.

The key predictor of monthly SA4 unemployment is the monthly SA4 count of persons receiving unemployment benefits. This data is provided by the Department of Social Security (DSS).  Currently, for the period Jan 2020 onwards this data has been provided directly to the ABS by the DSS. For the period Jan 2016 to Dec 2019 this data has been scraped from the DSS website. For Jan 2012 to Dec 2015 the data used was previously supplied to the ABS directly by the DSS. There are some small differences in the data from these sources, with the directly provided unemployment benefit recipient data more closely aligned with the ABS definitions of unemployment. It is expected that, going forward, the latest unemployment benefits data will be provided directly by the DSS to the ABS on a monthly basis.

The key predictor of monthly SA4 employment is the monthly SA4 count of jobs derived from the weekly Single Touch Payroll (STP) data. Ideally this would be a count of persons with jobs, but it is in fact a count of jobs, with the main scope exclusion being that it does not cover owner-managers of unincorporated enterprises. In earlier periods, broad coverage adjustments were made to the STP data because not all businesses had onboarded to using the single touch payroll system.

The Labour Force estimated resident population for persons 15 and over at SA4 level is also used as a predictor in both employment and unemployment models.

A summary of the explanatory variables for the employment and unemployment models is given in table 1 below. Parameter estimates and their standard errors are given in section 3.2.

Table 1: Explanatory variables
EmploymentUnemployment
Metropolitan versus ex-metropolitanSeasonal regressor
Log(DSS count)Metropolitan versus ex-metropolitan
Log(STP count)Log(DSS count)
Log (LF ERP)Log (LF ERP)

 

The LF time series of estimates contain seasonal patterns. Seasonal patterns are also observable in the DSS unemployment benefits recipient series. When unemployment counts are regressed against the DSS data, a residual seasonality is identifiable due to differences in the seasonal patterns of these series. Hence seasonal parameters were added in the fixed part of the unemployment model. No residual seasonality was identified when employment was regressed against the STP job count series, though this may become apparent as further time points are added.

2.4. Sampling error model

The sampling error is part of the Rao-Yu model and this is estimated empirically and then treated as known. Based on previous work by Zhang and Honchar (2016), the sampling error model for the LF survey sample was specified as a second order autoregressive model:

\(e^{*}_{dt} = \phi_{1}e^{*}_{d,t-1} + \phi_{2}e^{*}_{d,t-2} + \mu_{dt}, \; \; \mu_{dt} \sim NID(0,\sigma^{2}_{\mu})\)

The AR2 coefficients were calculated using the Yule-Walker equations and based on the median of the sample autocorrelations at lags one and two months for the period Jan 2016 to Oct 2022. While both state- and Australia-level AR2 coefficients were calculated, there was little variation between states, with the exception of Northern Territory. Modelled covariance values for the D*T block-diagonal covariance matrix were constructed from these AR coefficients, with the Australian/NT autocovariances applied to the SA4 level.

Table 2: Smoothed empirical autocorrelations of sampling errors
 LagAus: MedianCorrAus: ARNT: MedianCorrNT: AR
Employment 10.7670.6390.7380.566
Employment 20.6580.1680.6510.233
Unemployment 10.5050.4220.7010.461
Unemployment 20.3770.1650.6660.343

 

The variance of the log-transformed sampling error may be approximated by taking the square of the untransformed RSEs of the unemployment and employment estimates. These will be unstable at the SA4 level, and hence simple linear models were used to smooth them by regressing the log of the SA4-level RSE count estimates against the log of their responding sample sizes. These smoothed sampling error variances were the values for the diagonals of the D*T block-diagonal covariance matrix.

2.5. Bias corrections for logarithmic transform

We model the log-transformed LF estimates and hence need to make a bias correction when transforming back to the original scale. This is done using the following ‘crude’ method described in Rao and Molina (2015).

\(\hat{\theta}_{dt} = e^{\hat{\theta}^{*}_{dt} \ + \ 0.5MSE(\hat{\theta}^{*}_{dt})} \) 

where * indicates the log-transformed Rao-Yu model estimate.

The mean squared error estimate of the ‘crude’ bias-corrected estimator is given by:

\(MSE(\hat{\theta}_{dt}) = e^{(\hat{\theta}^{*} _{dt})^{2}}MSE(\hat{\theta}^{*} _{dt})\)

While more sophisticated methods for bias correction exist (Slud and Maiti, 2006), these are more complicated to implement and it was not clear that these would extend easily to the Rao-Yu model.

2.6. COVID-19 corrections

The outbreak of COVID-19 from March 2020 and the corresponding DSS Jobseeker measurement change caused the relatively stable relationship between unemployment and DSS unemployment benefit recipients counts to change. However, the regression coefficient vector of the Rao-Yu model is invariant over both small areas and time for the set of SA4s to which the model is fitted (i.e. state groups). To avoid likely biases in the modelled estimates it was necessary to analyse the nature of the change to the regression coefficient over time and apply appropriate time series corrections. For employment and STP, the impact of COVID-19 on the models was minimal so no COVID-19 corrections have been applied.

Initially, a simple level shift, inserted as a fixed effect in the model, was applied in April 2020. Variations on this approach looked at additional level shifts or additive outliers in March, April and May. However, a single April 2020 trend break performed best and improved the quality of the model and its estimates.

The effect of COVID-19 on the relationship between LF unemployment and DSS unemployment benefits data was more complicated than a simple level shift, however, being affected by state level lockdowns and government responses via Jobkeeper and Jobseeker. Hence a more sophisticated approach was pursued that would allow for more dynamic corrections. Figure 3 shows time series of the ratio of the LF unemployment trend to the DSS benefit recipient trend. The trends were estimated using a Bayesian structural time series model at the state level. The major government policy changes under COVID-19 are noted below the graph. 

Apr 20: JobSeeker COVID-19 supplement was first seen in LFS.
Apr 21: JobSeeker COVID-19 supplement was reduced.
Oct 21: both JobKeeper and JobSeeker were terminated.

Figure 1 shows the strong and evolving impacts of COVID-19 on the relationship between LF unemployment and DSS benefits data. From around 2022 the ratios appear to stabilise. Dynamic COVID-19 prior correction factors were determined such that, when they were applied to the DSS benefits trend series, the time series of the ratio of the trends would always equal the average trend ratio for the nine months of data available in 2022 (considered the new 'normal' ratio). Mathematically this is:

Find prior correction factors \(f_t \) such that:

\({y_t \over f_{t}x_{t}} = \bar{r}_{2022} \)

\(\rightarrow f_t = {y_t \over x_{t}\bar{r}_{2022}}\)

where:

\(y_t\) is the trend of LF unemployment count estimates at time \(t\)

\(x_t\) is the trend of DSS unemployment benefit recipient counts at time \(t\)

\(\bar{r}_{2022}\) is the average trend ratio for the first nine months of 2022

The prior correction factors described above were then applied to the original DSS benefit count series. Compared to the simple April 2020 level shift, the dynamic prior corrections:

  1. Reduced the magnitude of discrepancies between state-level direct estimates and the aggregated model estimates, particularly around the time of COVID-19.
  2. Reduced model biases as indicated in plots of modelled versus direct SA4-level estimates.
  3. Reduced the estimated relative root mean squared errors of the modelled estimates. 

2.7. Calibration

Additivity to broad-level, reliable direct estimates is a measure of the quality of the modelled SA4-level estimates. This is reported in section 3.1. While it may not be methodologically required or desirable, calibration of estimates to ensure exact agreement with reliable direct estimates is often performed.  This is a ‘cosmetic’ calibration that is done for consistency and in consideration of users.

The modelled estimates published in the June information paper have been calibrated to add exactly to the Australian direct survey employment and unemployment counts. This is a compromise that seeks to ensure as little change to the modelled estimates as possible, particularly the direction of the movements, while exhibiting some level of consistency with published direct estimates. The calibration method employed was one developed for ensuring additivity of seasonally adjusted component series to a series of directly seasonally adjusted totals (Stuckey, Zhang, McClaren, 2004). It attempts to balance minimising changes to levels with minimising changes to movements. Empirical results showed that an Australia-level calibration resulted in only minor changes to the modelled movements.

Because direct estimates for states are published from the Labour Force survey, it is highly desirable that the aggregated modelled estimates are also consistent at this level. Analysis suggests that the additivity adjustments to the modelled estimates are small enough to make this cosmetic calibration acceptable and will likely be implemented in future releases.

2.8. Uncertainty estimation

Mean squared error (MSE) estimates for the Rao-Yu model are obtained following the results given in Rao and Molina (2015, 98-111). The achieved MSE estimates corresponding to our model are summarised in section 3.2.

These are level MSEs. It is also of interest to obtain MSEs of the modelled count movements and of the unemployment rate, which is defined as unemployment / (unemployment + employment). The MSEs of the movements in unemployment rates may also be desired. Having MSEs of movements is important to indicate which of them are significantly different from zero.

Obtaining reasonable estimates of the movement MSEs and the unemployment rate MSEs is expected to be challenging analytically and involves estimates of the correlations – the correlation between the modelled estimates at lag 1 and between the unemployment and employment modelled estimates.  Hence we expect to need to apply a parametric bootstrap method which is likely to be computationally demanding.  Based on a one-off bootstrapping exercise it may be possible to determine approximate correlation coefficients that can be used in analytical MSE expressions. This approach, if feasible, would obviate the need for time-consuming and expensive bootstrapping to be performed each time the monthly estimates are produced.

3. Results

Parameter estimates and standard errors for the employment and unemployment models are given below using the NSW + ACT group as an example. Details of parameter estimates for all state groups are given in the appendix.

Table 3: Employment parameter estimates from Rao-Yu model
ParameterEstimateStd. Error 
Intercept0.4510.503
Region=Ex-metropolitan-0.0690.033
Log(DSS)-0.0300.007
Log(STP)0.5200.034
Log(ERP) 0.4560.054
Area random effect variance3.70E-034.00E-03
Temporal random effect variance 2.10E-052.80E-05
AR1 parameter 0.9800.166
Table 4: Unemployment parameter estimates from Rao-Yu model
ParameterEstimateStd. Error 
Intercept-0.8171.218
Region=Ex-metropolitan-0.3960.090
Log(DSS)0.6110.039
Log(ERP)0.3590.103
Seasonal=Feb0.0410.018
Seasonal=Mar0.0000.020
Seasonal=Apr-0.0470.022
Seasonal=May-0.0290.022
Seasonal=Jun-0.0700.023
Seasonal=Jul-0.0650.023
Seasonal=Aug-0.0510.023
Seasonal=Sep -0.0380.023
Seasonal=Oct-0.0120.022
Seasonal=Nov-0.0630.021
Seasonal=Dec -0.0830.019
Area random effect variance0.0340.010
Temporal random effect variance 0.0120.001
AR1 parameter0.1510.103

 

Generalising across the state groups, it is noted that the area random effect variance is typically bigger than the temporal random effect variance. Indeed, for employment, the temporal random effects variances are very small and non-significant. The AR1 parameter estimate values of 0.98 for employment indicate bounding of the AR1 parameter estimation and non-stationarity of the AR1 model. In this case the dynamic model of Fay and Diallo (2012) may be a better fit because the AR1 coefficient is unconstrained. Often the area random effect variances are also non-significant for employment. This seems to indicate that the auxiliary variables can explain most of the variation in employment over SA4s and time. 

The primary measure of the quality of small area modelled estimates is the relative root mean squared error (RRMSE). RRMSE estimates for levels are summarised by state in the tables below.

Table 5: RRMSEs for employment
 NSWVICQLDSAWATASNTACT
Min0.0180.0150.0150.0180.0160.0130.0190.013
Mean0.0280.0260.0270.0230.0260.0210.0270.015
Max0.0420.0390.0550.0330.0480.0320.0340.018
Table 6: RRMSEs for unemployment
 NSWVICQLDSAWATASNTACT
Min0.0970.0650.0820.0600.0820.0640.0950.079
Mean0.1170.0830.1040.0720.1060.0770.1300.084
Max0.1420.1020.1460.0970.1310.0960.1650.089

 

A comparison of the estimated RRMSEs to the direct survey RSEs for the example month of Sep 2022 is given in Figures 2 and 3.

The RRMSEs are (almost) always smaller than their direct RSE counterparts and are well below the quality thresholds associated with published ABS estimates (RSEs greater than 25% are annotated to indicate they should be used with caution).  

Another measure of the quality of the (uncalibrated) modelled estimates is how closely they agree with reliable direct estimates when aggregated over SA4s. The tables below summarise the (absolute values of the) percentage differences across time points when comparing modelled and direct estimates at state level.

Table 7: Differences between modelled and direct state estimates for employment
 NSWVICQLDSAWATASNTACT
Mean1.31%0.88%1.13%0.65%0.72%1.18%1.27%1.89%
Median1.20%0.55%0.88%0.43%0.44%1.09%0.85%1.83%
Max3.80%3.29%2.89%1.70%2.39%2.53%4.21%4.44%
Table 8: Differences between modelled and direct state estimates for unemployment
 NSWVICQLDSAWATASNTACT
Mean4.90%4.62%4.51%4.43%3.97%5.39%7.10%5.32%
Median4.53%3.83%3.95%3.76%3.49%4.12%5.57%4.35%
Max17.16%25.18%18.92%20.73%13.60%29.20%22.73%20.80%

 

Modelled employment is always fairly close to the direct estimates at state level. For unemployment, however, the discrepancies can be quite large, notwithstanding that the average or median discrepancies are reasonable. We note, however, that the uncertainty in the direct and modelled estimates has not been accounted for, so many of these apparent differences will not be significantly different from zero.  Time series plots of the relative differences are given below.

There is some evidence that, at least in some states, there is systematic behaviour in the additivity properties of the estimates beyond what would be expected from the rotating panel nature of the LF survey. This may be evidence of inadequate corrections for the changing relationship in target and auxiliary time series due to COVID-19.

As a final quality measure, we produced time series plots for each SA4 comparing the modelled estimates (both calibrated and uncalibrated) to the direct survey estimates. An example SA4 from each state is given in Figures 6 and 7 below. In all cases we note the greater stability of the modelled estimates compared to the direct estimates.

4. Further work

The following bullet points list a few areas for further investigation and implementation.

  • As has been noted in section 2.8, there is a need to obtain uncertainty estimates both for estimates of movements and for unemployment rates. 
  • We are also examining whether other auxiliary variables, such as the SA4 proportion of persons by age group, may improve the predictive abilities of the model.
  • Further refinements of the COVID-19 correction may also be possible, in particular examining a metropolitan/rest-of-state dimension.
  • The Rao-Yu model has high computational demands and there will be relatively tight time frames involved in the final monthly implementation. Hence we will be investigating whether we can fit the models at a lower than monthly frequency and obtain estimates by carrying forward the model parameter estimates.

4.1. Multivariate Rao-Yu model

The models fitted thus far have been independent, though there is clearly a multivariate relationship between the employment, unemployment and not-in-the-labour force target variables. Theory for a multivariate Rao-Yu model has been developed and the SAE2 R package (Fay and Diallo, 2019) allows for such models to be fitted. The benefits of such a model would be:

  1. Potentially greater prediction accuracy by capitalising on correlations between target variables.
  2. Consistency of employment, unemployment and not-in-the-labour-force estimates with the LF total population. Inconsistency of independently modelled estimates is likely to become a greater issue as we seek to model at lower levels of geography.

Preliminary work has indicated that fitting a multivariate model may be feasible.  The challenges are (i) obtaining reliable estimates of the multivariate sampling error correlations and (ii) the additional computational burden. 

4.2. Estimation for finer-level geographies and demographics

A longer-term goal is for the ABS to provide modelled labour force status variables at the levels of SA2, SA3 and local government area (LGA). These are considerably smaller geographical regions than SA4. While there are approximately 87 SA4s in the LFS scope across Australia, there are over 2000 SA2s, about 350 SA3s and 550 LGAs.

In theory, area-level models such as the Rao-Yu model should be able to be applied to these lower levels of geography. Two challenges immediately present themselves: (i) the relatively large numbers of areas with zero sample, leading to many purely synthetic estimates, and (ii) the computational challenges involved. It may well be that the quality of monthly modelled estimates at these levels is not sufficient and that quarterly, biannual or annual estimates may be necessary.

The ABS would also like to provide modelled SA4-level estimates disaggregated into age group and gender. The challenges mentioned for finer geographies would be relevant. However, given the LF is clustered by area, not by demographics, we might expect fewer of these domains to be without any sample. It also might be reasonable to suppose that the model relationships might hold more strongly across small areas within an age group by gender demographic than across fine small areas.

Feedback

Feedback on this paper is welcomed and can be provided to methodology@abs.gov.au.

Bibliography

Show all

Appendix

A.1. Parameter estimates for state groups

Back to top of the page