Derivations and imputations
Derivation is the process where some variables (where no response has been provided) are assigned values based on responses from other family members present in the same dwelling. Variables that may be derived from responses given by other family members present in the same dwelling are:
- Country of Birth of Person (BPLP)
- Country of Birth of Father (BPMP)
- Country of Birth of Mother (BPFP)
- Language Spoken at Home (LANP).
If there is insufficient information provided to derive a response for these items, they are determined to be 'Not stated'.
In addition, the derivation process is used to create new variables by combining responses from a number of questions. Variables which are created this way include:
- Housing Loan Repayments (HLRD)
- Rent (RNTD)
- Tenure Type (TEND)
- Labour Force Status (LFS06P).
Imputation is a statistical process for predicting values where no response was provided to a question and a response could not be derived.
Where no Census form is returned, the number of males and females in 'non-contact' private dwellings may be imputed. In addition, the following key demographic variables may also be imputed, if they are 'Not stated':
- Age (AGEP)
- Place of Usual Residence (PURP)
- Registered Marital Status (MSTP).
The imputation method used for the 2006 Census is known as 'hotdecking'. In general this method involves locating a donor record and copying the relevant responses to the record requiring imputation. The donor record will have similar characteristics and must also have the required variable(s) stated. In addition the donor record will be located geographically as close as possible to the location of the record to be imputed. The match must occur within the same Capital City or Balance of State. When a suitable match is found, then the copying of the response(s) from the donor record to the variable(s) that have missing values can occur.
The methodology for imputation is tailored to two situations. Firstly, where no Census form has been returned and secondly where a partially completed form was returned.
No Census form returned - private dwelling: Where a Census collector has identified that a private dwelling was occupied on Census Night but a Census form was not returned, the number of males and females normally in the dwelling and their key demographic variables may require imputation. In these cases, the non-demographic variables are set to 'Not stated' or 'Not applicable'.
The 'No Census form returned' scenario has two variations. Firstly, where no form was returned but the collector was able to ascertain the number of males and females from a resident of the dwelling, or in a small number of cases a building manager or neighbour. And secondly, where no form was returned and the number of males and females remains unknown.
For records where the number of males and females is unknown, two imputation processes are required. Initially these records must have their number of males and females imputed using hotdecking. Then a second imputation (also using hotdecking) is run to impute the key demographic variables.
To hotdeck the number of males and females, the donor records must meet several conditions:
- They must be records where no Census form was returned but where the number of males and females was ascertained by the collector;
- They must have a similar Dwelling Structure (STRD) to the record to be imputed;
- They must be located geographically as close as possible to the location of the record to be imputed.
The number of males and females are the only data copied from the donor record in the first hotdecking process.
In the next process, the records which have just had their number of males and females imputed, are subjected to the same hotdecking process as those records where the number of males and females had been ascertained.
This hotdecking process imputes the key demographic variables. Again the donor records must meet several conditions:
- They must be records where everyone within the dwelling provided all their demographic characteristics;
- They must have similar Dwelling Structure (STRD) and Dwelling Location (DLOD);
- They must have identical counts of males and females;
- They must be located geographically as close as possible to the location of the record to be imputed.
The key demographic variables are then copied from the donor records to the records requiring imputation.
The method of imputing the counts of males and females in previous Censuses was to use the average number of males and females in responding private dwellings for that Collection District (CD). This method was discovered to have over-imputed the 2001 Census male and female counts.
No Census form returned - non private dwelling: Where a person in a non-private dwelling did not return a form, their demographic characteristics are copied from another person in a similar non-private dwelling using Type of Non-Private Dwelling (NPDD).
Census form returned: Where a form was returned, some or all of the demographic characteristics may require imputation. If Registered Marital Status and/or Place of Usual Residence are 'Not stated' they are imputed using hotdecking, whereas Age is imputed based on distributions obtained from previous Censuses.
Registered Marital Status imputation is carried out by finding a similar person in a similar responding dwelling based on the variables:
- Sex (SEXP)
- Relationship in Private Dwelling (RLHP)
- Age (AGEP)
- Dwelling Type (DWTD)
- Type of Non-Private Dwelling (NPDD).
Registered Marital Status is only imputed for persons aged 15 years and over, and set to 'Not applicable' for persons aged under 15 years.
Where a complete usual address on Census Night is not provided, the information that is provided is used to impute an appropriate CD (and SLA). A similar person in a similar dwelling is located and missing usual residence fields are copied to the imputed variable.
These are based on the variables:
- Residential Status in a Non-Private Dwelling (RLNP)
- Dwelling Location (DLOD)
- Type of Non-Private Dwelling (NPDD).
Where date of birth or age details are incomplete or missing, the variable Age (AGEP) is imputed based on distributions for particular populations (for example, male or female; marital status and state/territory of usual residence). Factors affecting age imputation include any reported labour force activity, educational institution attending and other family member relationships and ages.
Records that have required imputation can be identified using the Imputation flags:
- Imputation Flag for Age (IFAGEP)
- Imputation Flag for Number of Males and Females in Dwelling (IFNMFD)
- Imputation Flag for Place of Usual Residence (IFPURP)
- Imputation Flag for Registered Marital Status (IFMSTP)