Non-standard Loss Functions for Deep Neural Networks with Tabular Data
Deep Neural Networks (DNNs) are increasingly the Machine Learning (ML) method of choice for image and text classification problems, but there are limited examples of their use with traditional tabular data, which are the most common in National Statistical Organisations like the ABS. Methodology Division is researching how ML methods could be used for official statistics, in particular, for predictive modelling (with both household and business statistics applications), and DNNs are one of the methods being considered.
The variables of interest for our applications are not always straightforward categorical or continuous variables, but often include ordinal categorical variables and zero-inflated continuous variables. While DNNs are capable of modelling these types of variables, the built-in loss functions in software packages for DNNs do not cover these scenarios. In order to better model these variables using DNNs, we needed to derive modified encodings and customised loss functions.
Consider, for example, the case of an ordinal categorical variable. The standard approach for modelling categorical variables is to one-hot encode them, that is, to create separate indicator variables for each category of the variable. Using this approach for an ordinal variable does not make use of the ordering of the variable, so we modified the encoding to create an indicator variable indicating whether the value is greater than or equal to the category, doing this for all categories except the smallest. With this setup, the predicted probabilities produced by the DNN are conditional probabilities (e.g. probability of value being at least 3 given that it is at least 2), rather than probabilities for individual categories. A customised loss function was created to first convert the conditional probabilities into categorical probabilities, and then use these categorical probabilities in the categorical cross entropy loss function, the standard loss function used for categorical variables.
Modelling of a zero-inflated continuous outcome can be difficult because of the combination of a significant proportion of zeros with continuous values. The usual solution is to separately model both the probability of being a zero and the predicted value of the continuous data conditional on being non-zero. It is desirable to combine these two models into a single neural network. A customised loss function was written that uses indicator functions to combine a binary categorical loss with a loss appropriate for continuous data. It is not possible for a standard DNN to output both predicted probabilities of being zero and predicted values of continuous data in the output layer. Hence it was necessary for the customised loss function to first map the predicted values to predicted probabilities of being zero using the sigmoid function.
We found that using the customised loss function to fit a DNN for an ordinal categorical variable improved accuracy over a DNN model where the ordinal categorical variable was treated simply as categorical. We found the predictive accuracy of the zero-inflated DNN was superior to a DNN that did not model continuous data conditional on being non-zero. We expect these approaches to be useful as we continue to fit predictive models using DNNs.
For more information, please contact Kate Traeger at methodology@abs.gov.au.
The ABS Privacy Policy outlines how the ABS will handle any personal information that you provide to us.