4402.0.55.001 - Microdata: Childhood Education and Care, Australia , June 2011

ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 25/10/2012

Page tools: Print

Print Page Print all pages in this product

USING THE CURF

ABOUT THE CURF
IDENTIFIERS
USING REPEATING DATASETS
USING FLAG ITEMS
MULTI-RESPONSE DATA ITEMS
WEIGHTS AND ESTIMATION
STANDARD ERRORS
GEOGRAPHY
CURF FILE NAMES
INFORMATION FILES

ABOUT THE CURF

The data included in the CEaCS 2011 CURF are released under the provisions of the Census and Statistics Act 1905. This legislation allows the Australian Statistician to release unit record data, or microdata, provided that it is done "in a manner that is not likely to enable the identification of a particular person or organisation to which it relates".

The ABS ensures confidentiality of the data by:

removing name, address and any other information that might uniquely identify an individual
changing a small number of values - particularly unusual values - and removing very unusual records
controlling the detail available for all records on the CURF
excluding some data items that were collected
controlling the modes of access and restrict access to more detailed data
placing restrictions on how the data are used, supported by information in the User Manual: Responsible Use of ABS CURFs and both the undertaking signed by the head of each organisation and the terms and conditions signed by each user.

As a result, data on the CURF will not exactly match other previously published estimates. Any changes to the distribution of values are not significant and the statistical validity of aggregate data is not affected.

IDENTIFIERS

There are a series of unique identifiers on records at each level of the file. Households (or Income Units) have a household identifier (ABSHID) and children have a child identifier (ABSPID). Repeating datasets also have identifiers to identify the type of care used by the income unit (ABSIID) and the child's care (ABSCID).

File level identifiers
The following are the identifiers:

1. Income Unit = ABSHID
2. Income Unit Care = ABSHID, ABSIID
3. Child = ABSHID, ABSPID
4. Child Care = ABSHID, ABSPID, ABSCID

All these identifiers are on each level.

As well as uniquely identifying all units, the identifiers are vital, for associated units, for copying attributes from one type of counting unit to another. For example, an income unit variable such as the labour force status of parents can be copied to all the children within the family. The means by which this might be done in SAS is illustrated below:

SAS CODE

PROC SORT DATA=CEC11E.CEC11EHH OUT=CEC11EHH; BY ABSHID;
PROC SORT DATA =CEC11E.CEC11EPN OUT=CEC11EPN; BY ABSHID ABSPID;
DATA MERGFILE (KEEP=ABSHID ABSIID ABSPID ABSCID LFSPAR);
MERGE CEC11EHH CEC11EPN;
BY ABSHID;
RUN;

SPSS CODE

GET
FILE=CEC11EHH.
SORT CASES BY ABSHID.
SAVE OUTFILE=SORTEDIU.
GET
FILE=CEC11EPN
/KEEP=ABSHID ABSIID ABSPID ABSCID.
SORT CASES BY ABSHID ABSPID.
SAVE OUTFILE=SORTEDCH.
MATCH FILES FILE=SORTEDCH
/TABLE=SORTEDIU
/BY ABSHID.
SAVE OUTFILE=MERGFILE.

The following is an example of an income unit where the data item LFSPAR has been copied from the Income Unit level onto the Child level.

TABLE 1: LABOUR FORCE STATUS OF CHILD'S PARENTS

ABSHID	ABSPID	LFSPAR

CEC11E0001	1	Couple family - one parent employed
CEC11E0001	2	Couple family - one parent employed
CEC11E0002	1	Couple family - both parents employed
CEC11E0002	2	Couple family - both parents employed

USING REPEATING DATASETS

The Income Unit and Child levels are counting units, whereas the Income Unit Care and Child Care levels are repeating datasets. The repeating datasets in the CEaCS are a set of data with a counting unit which may be repeated for a child or an income unit. The 'one to many' relationships described in File Structure, for the links between the Income Unit level and the Income Unit Care level and the Child level and the Child Care level, shows the connection between counting units and repeating datasets, i.e. an event or episode is repeated so that multiple records with the same set of data exist for the same child (or income unit).

For example, a child may have used more than one instance of child care such as (i) a long day care centre, (ii) family day care and (iii) grandparents. Consequently, three records would be present on the Child Care level for this child, representing a repeating dataset, with each record containing information for a common set of data items, e.g. Number of days of care used, Number of hours of care used, cost of the care and so on. Also, the child will have summary records in addition to the individual care records, described below.

In this example, although the three records all relate to a single child, any totals from the Child Care level are a count of child care arrangements.

Repeating datasets are only useful when common information is collected for each instance of a counting unit. For example, each child in a family may have several instances of care (CARINDX) with a cost of care after the Child Care Benefit and the Child Care Rebate (COSTCCR) associated with it, for last week and usually (USLWFLG). Therefore, each child care unit has a cost of care after CCB and CCR (COSTCCR) associated with it. This enables a table to be run on all instances of care.

TABLE 2: EXAMPLE OF 'USUAL CHILD CARE COST' REPEATING DATASET

ABSHID	ABSPID	CARINDX	USLWFLG	COSTCCR($)

CEC11E0031	5	2 (long day care)	2	38
CEC11E0031	5	3 (family day care)	2	10
CEC11E0031	5	7 (grandparent care)	2	5
CEC11E0031	5	21 (all care)	2	53
CEC11E0031	5	22 (formal care)	2	48
CEC11E0031	5	23 (informal care)	2	5
CEC11E0031	5	32 (both formal and informal)	2	53

To run a table on the dataset outlined above, the following SAS code (or equivalent) can be used. This will give you output that shows the frequency of each cost (dollar value) for each type of care usually used by the single child:

PROC FREQ DATA=CEC11E. CEC11ECC;
WHERE USLWFLG = 2;
TABLE CARINDX*COSTCCR;
RUN;

Summary Records and Data Items

In addition to the general or base records present in the repeating datasets (i.e. on the Income Unit Care and Child Care levels) that provide details about each instance of child care, there are also 'summary' records that provide aggregate information for selected groupings of the types of care. For example, summary records are available for groupings of formal care, informal care and all care.

In the example of a child who attended long day care, family day care and also received care from a grandparent, there are three base records on the Child Care level because they attended three separate instances of child care. For each record the data item cost of care after CCB and CCR was reported as $38, $10 and $5 respectively. Therefore, the summary record for this child for the total cost of formal care (i.e. long day care and family day care) is recorded as $48 ($38 + $10). Similarly, the summary record for this child for the total cost of all care (i.e. all three types of care) is recorded as $53.

The following data items comprise the classifications that enable the data for these summary records to be tabulated:

Income Unit Care level - Type of care used by the family (SASName IUCINDX).
Child Care level - All types of care (SASName CARINDX).

TABLE 3: EXAMPLE OF USUAL TYPE OF CARE, BY USUAL WEEKLY COST AFTER CCB AND CCR

	COSTCCR (USUAL WEEKLY COST OF CARE AFTER CCB AND CCR)

	$5	$10	$38	$48	$53

CARINDX (Type of care and/or preschool)
Long day care	0	0	1	0	0
Family day care	0	1	0	0	0
Grandparent	1	0	0	0	0
Children who use care	0	0	0	0	1
Children who use formal care	0	0	0	1	0
Children who use informal care	1	0	0	0	0
Children who used both formal and informal care	0	0	0	0	1

Note that although the output above only relates to a single child, the totals are a count of all conditions for that child. That is, the table above shows the frequency of different costs for each type of care for an individual child.

As with the Child level file, some data items in a repeating dataset are only applicable to a particular sub-population of the dataset. For instance, the item 'Main reason intends to claim for the cost of formal care' from the Child Care level is only applicable for formal care. Records outside the sub-population will appear as a "Not applicable" e.g. children with just informal care or no care. In the Child Care level the usual or last week flag must be used. Refer to 'Using Flag Items'.

In addition, note that if you want to create ranged hours or cost tables which include custom totals for type of care (for example, all formal care excluding occasional care) you need to sum hours and cost for the types of care included in your total to the Child level before ranging the result.

USING FLAG ITEMS

To enable easier table specification and to ensure that the correct populations, and hence the correct data, are being tabulated, a number of 'flags' have been included in the CURF that should be used at all times when extracting data.

Usual or last week flag

There is a usual or last week care flag (USLWFLG) that allows users to look at a child's care usage for the reference week (last week) or their usual care usage. This flag is on the Child Care level. A similar flag at the Income Unit Care level (IUCSFLG) filters whether the care used by the family is on a usual or last week basis. These flags also include or exclude preschool from care used last week or usually.

It is imperative that the usual or last week care flags are used when any data items from the Child Care level or the Income Unit Care level are used, regardless of whether the care level data items are used alone or with other Child level or Income Unit level data items. If these flags are not used for child care or income unit care data items, the data will be incorrect.

The categories of the flags are:

1. Care usually used including preschool
2. Care usually used excluding preschool
3. Care used last week including preschool
4. Care used last week excluding preschool

Labour force scope flag

In households where all adults were out on scope of the LFS, no information was obtained for the 2011 CEaCS. However, as long as at least one parent in the household was in scope for the LFS, information about children aged 0–12 years and some information about their parents were collected and included in the 2011 CEaCS.

There is a labour force scope flag (LFSFLAG) to indicate whether the income unit is out on scope. This flag (present on the Income Unit level) indicates if one parent in a family was out on scope or coverage. Limited employment and demographic data are available for these families.

Information about the working arrangements used by parent/guardians to help care for their child was not available for parent/guardians who were out on scope or coverage of the labour force for any reason.

MULTI-RESPONSE DATA ITEMS

A number of questions included in the survey allowed respondents to provide one or more responses. Each response category for one of these 'multi-response questions' (or data items) is basically treated as a separate data item. These data items have the same general data item identifier (SASName) but are each suffixed with a letter – A for the first response, B for the second response, C for the third response, D for the fourth response and so on.

For example, the multi-response data item 'All sources of income of parent(s)' (with a general SASName of ASCIPAR – see data item list), has five response categories. Consequently, five data items have been produced - ASCIPARA, ASCIPARB, ASCIPARC, ASCIPARD and ASCIPARE.

Each data item in the series (i.e. ASCIPARA-- ASCIPARE) will have two response codes: A 'Yes' response (for the first in the series (code 1), for the second in the series (code 2) etc.) and a 'Null' response (code 0) indicating that the response was not relevant for the respondent. The first data item in the series also includes a 'Not Applicable' response which comprises the respondents not asked the questions (e.g. ASCIPARA with a value of 9).

It should be noted that the sum of individual multi-response categories will be greater than the population or number of people applicable to the particular data item as respondents are able to select more than one response.

Multi-response data items can be identified in the data item list as SASNames followed by a range of letters in brackets; for example, ASCIPAR(A-E). They can also be identified in the CURF data item list with a # appended to the data item name (e.g. Usual education/care/parenting arrangements two years prior to attending school #).

WEIGHTS AND ESTIMATION

As the survey was conducted on a sample of households in Australia, it is important to take account of the method of sample selection when deriving estimates. This is particularly important as a child's chance of selection in the survey varied depending on the state or territory in which they lived. Survey 'weights' are values which indicate how many population units are represented by the sample unit. See discussion in Survey Methodology.

There are two weights provided on the CEaCS CURF, as follows:

HHWGT - household weight, for use with Income Unit and Income Unit Care levels
CHWGHT - child weight, for use with Child and Child Care levels

The weight for the relevant level should be applied when deriving estimates from the CURF. It is essential to apply the appropriate weight for the required estimate, rather than just derive a count of records falling into each category. If a child or household weight were to be ignored, then no account would be taken of a child's or household's chance of selection in the survey or of different response rates across population groups, with the result that counts produced could be biased.

The application of weights ensures that:

child estimates conform to an independently estimated distribution of the population by age, sex, state/territory and section of state, and
household estimates conform to an independently estimated distribution of households by certain household characteristics (e.g. by number of adults and children), rather than to the distributions within the sample itself.

STANDARD ERRORS

Each record on the each of the levels also contains 60 replicate weights and, by using these weights, it is possible to calculate standard errors for weighted estimates produced from the microdata. This method is known as the 60 group Jack-knife variance estimator. When calculating standard errors, it is important to select the replicate weights which are most appropriate for the analysis being undertaken. The replicate weights are as follows:

WHM0101-WHM0160 - use for Income Unit and Income Unit Care levels
WPM0101-WPM0160 - use for Child and Child Care levels.

Under the Jackknife method of replicate weighting, weights were derived as follows:

60 replicate groups were formed with each group formed to mirror the overall sample (where units from a collection district all belong to the same replicate group and a unit can belong to only one replicate group)
one replicate group was dropped from the file and then the remaining records were weighted in the same manner as for the full sample
records in that group that were dropped received a weight of zero.

This process was repeated for each replicate group (i.e. a total of 60 times). Ultimately each record had 60 replicate weights attached to it with one of these being the zero weight.

Replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit record analyses such as chi-square and logistic regression to be conducted which take into account the sample design. Replicate weights for any variable of interest can be calculated from the 60 replicate groups, giving 60 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.

To obtain the standard error of a weighted estimate y, the same estimate is calculated using each of the 60 replicate weights. The variability between these replicate estimates (denoting y(g) for group number g) is used to measure the standard error of the original weighted estimate y using the formula:

where:

g = the replicate group number

y(g) = the weighted estimate, having applied the weights for replicate group g

y = the weighted estimate from the sample.

The 60 group Jack-knife method can be applied not just to estimates of the population total, but also where the estimate y is a function of estimates of the population total, such as a proportion, difference or ratio. For more information on the 60 group Jack-knife method of SE estimation, see Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee), July 1999 (cat. no. 1352.0.55.029).

Use of the 60 group Jack-knife method for complex estimates, such as regression parameters from a statistical model, is not straightforward and may not be appropriate. The method as described does not apply to investigations where survey weights are not used, such as in unweighted statistical modelling.

GEOGRAPHY

To enable analysis at a regional level, each record on the CURF contains a state/territory identifier (STATECF) and two sub-state identifiers – Capital city/Balance of state/Territory (AREAOUT) and Remoteness structure (AREAREMC). The AREAOUT geographic data item has two output categories – Capital city and Balance of state/Territory. Only the capital city statistical divisions (as defined in the Australian Standard Geographical Classification ASGC (cat. no. 1216.0)) of the six states are included in the Capital city category. All other regions in Australia, including the territory capitals Darwin and Canberra, are classified to the Balance of state/Territory category.

Conditions of Use of Geographic Data Items

To provide CURF users with greater flexibility in their analyses, the ABS has included several sub-state geography data items (as described above) on the Expanded CURF.

Conditions are placed on the use of these items. Tables showing multiple data items, cross tabulated by more than one sub-state geography at a time are not permitted due to the detailed information about small geographic regions that could be presented. However, simple cross-tabulations of population counts by sub-state geographic data items may be useful for clients in order to determine which geography item to include in their primary analysis, and such output is permitted.

CURF FILE NAMES

The 2011 CEaCS Expanded CURF can be accessed through the RADL and is available in SAS, SPSS and STATA formats. The CURF comprises the following files:

Data files

SAS Files
These files contain the data for the CURF in SAS format.

CEC11EHH.SAS7BDAT - the CEaCS CURF Income Unit level file in SAS for Windows format.
CEC11EIC.SAS7BDAT - the CEaCS CURF Income Unit Care level file in SAS for Windows format.
CEC11EPN.SAS7BDAT - the CEaCS CURF Child level file in SAS for Windows format.
CEC11ECC.SAS7BDAT - the CEaCS CURF Child Care level file in SAS for Windows format.

SPSS Files
These files contain the data for the CURF in SPSS format.

CEC11EHH.SAV - the CEaCS CURF Income Unit level file in SPSS format.
CEC11EIC. SAV - the CEaCS CURF Income Unit Care level file in SPSS format.
CEC11EPN. SAV - the CEaCS CURF Child level file in SPSS format.
CEC11ECC. SAV - the CEaCS CURF Child Care level file in SPSS format.

STATA FILES
These files contain the data for the CURF in STATA format.

CEC11EHH.DTA - the CEaCS CURF Income Unit level file in STATA format.
CEC11EIC. DTA - the CEaCS CURF Income Unit Care level file in STATA format.
CEC11EPN. DTA - the CEaCS CURF Child level file in STATA format.
CEC11ECC. DTA - the CEaCS CURF Child Care level file in STATA format.

INFORMATION FILES

Data Item List

The Data item list contains all the data items, including details of categories and code values, that are available on the Expanded CURF. It is available on the Downloads tab.

Formats File

FORMATS.SAS7BCAT - the SAS format file which provides labels for associated code values in the SAS version of the CURF.

Frequency Files

FREQUENCIES_CEC11EHH.TXT - contains weighted and unweighted frequency counts for all Income Unit level data items. The file is in plain text format.
FREQUENCIES_CEC11EIC.TXT - contains weighted and unweighted frequency counts for all Income Unit Care level data items. The file is in plain text format.
FREQUENCIES_CEC11EPN.TXT - contains weighted and unweighted frequency counts for all Child level data items. The file is in plain text format.
FREQUENCIES_CEC11ECC.TXT - contains weighted and unweighted frequency counts for all Child Care level data items. The file is in plain text format.