4442.0.55.001 - Microdata: Family Characteristics, Australia , 2009-10 Quality Declaration 
ARCHIVED ISSUE Released at 11:30 AM (CANBERRA TIME) 27/04/2012   
   Page tools: Print Print Page Print all pages in this productPrint All

FILE STRUCTURE


FILE STRUCTURE
RELATIONSHIP BETWEEN LEVELS
WEIGHTS AND ESTIMATION
STANDARD ERRORS
IDENTIFIERS
SPECIAL CODES
MULTI-RESPONSE FIELDS
GEOGRAPHY

FILE STRUCTURE

Three separate files (also referred to as levels) make up the 2009-10 Family Characteristics Expanded CURF. These files are:

  • Household
  • Family
  • Person
Each of the above files are available in SAS, SPSS and STATA formats.

Household, Family and Person level

The Household level file contains data items relating to the household such as 'Equivalised weekly household income' and 'State or territory of usual residence'. The Family level file contains data items relating to the family such as 'Labour Force status of parents/guardians' and 'Family composition'. The Person level file contains information about each survey respondent such as their age, sex, country of birth and 'Whether parent/guardian of children of any age living in household'.

RELATIONSHIP BETWEEN LEVELS

The relationship between these three levels forms a hierarchy. A household level record may be linked to one or more family level records; each family level record is linked to two or more person level records. The structure is not a pure hierarchy in that there are no family level records for persons who are not family members.

There are a total of 14,151 household records, 10,360 family records, 35,525 person records.WEIGHTS AND ESTIMATION

There are three weights provided on the FCTS CURF, one for each of the four levels, as follows:
  • FINHHWT (Household weight)
  • FINFAMWT (Family weight)
  • FINPRSWT (Person weight)
The weight for the relevant level should be applied when deriving estimates from the CURF. It is essential to apply the appropriate weight for the required estimate, rather than just derive a count of records falling into each category. If the person, family or household weight were to be ignored, then no account would be taken of a person's, family's or household's chance of selection in the survey or of different response rates across population groups, with the result that counts produced could be biased. The application of weights ensures that estimates conform to an independently estimated distribution of the population by age and other characteristics, rather than to the distributions within the sample itselfSTANDARD ERRORS

Standard errors for each estimate produced from this CURF can be calculated using the replicate weights provided on the files.
Each record on the CURF contains 3 sets of replicate weights (each set with 30 replicate weights). Using these replicate weights, it is possible to calculate standard errors for estimates produced from the CURF. This method is known as the 30 group Jack-knife standard error estimator. When calculating standard errors, it is important to select the replicate weights which are most appropriate for the analysis being undertaken. The replicate weights are as follows:
  • WHM0301 to WHM0330 - use for Household estimates
  • WFM0301 to WFM0330 - use for Family estimates
  • WPM0201 to WPM0230 - use for Person estimates
To obtain the standard error of a weighted estimate y, calculate the same estimate using each of the 30 replicate weights. The variability between these replicate estimates (denoting y(g) for group number g) is used to measure the standard error of the original weighted estimate y using the formula:

where:

g = the replicate group number

y(g) = the weighted estimate, having applied the weights for replicate group g

y = the weighted estimate from the sample.

The 30 group Jack-knife method can also be applied where the estimate y is a function of estimates of population total, such as a proportion, difference or ratio. For more information on the 30 group Jack-knife method of SE estimation, see Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee), July 1999 (cat. no.1352.0.55.029).

Use of the 30 group Jack-knife method for complex estimates, such as regression parameters from a statistical model, is not straightforward and may not be appropriate. The method as described does not apply to investigations where survey weights are not used, such as in unweighted statistical modelling.

To apply Person level characteristics to an episodic level, the Person level characteristics must be merged on to the episodic level files (i.e. SPS10EE, SPS10EA or SPS10EC) using the person identifier (ABSHID). To do this, the following SAS code (or equivalent) can be used:

PROC SORT DATA = SPS10EP;

BY ABSHID;

PROC SORT DATA = SPS10EE; (or SPS10EA or SPS10EC)

BY ABSHPID;

DATA MERGEFILE;

MERGE SPS10EE (IN = A) (or SPS10EA or SPS10EC)

SPS10EP (KEEP = ABSHID AGECE SEX FINWTPM IN = B);

BY ABSHID;

IF A AND B THEN OUTPUT; *Only keeps records which are present on both files;

RUN;

The KEEP statement includes all person data items specified to be merged on to the episodic level file. Person characteristics can be merged on to the episodic levels to identify the characteristics of the persons who participated in sport and physical recreation activities (in the case of SPS10EE), attended sporting events as spectators (SPS10EA) or attended the selected cultural venues and events (SPS10EC), and these characteristics are applied to each course or record on those particular files. 'Participation in sport and physical recreation', 'Spectator attendance at sporting events', 'Attendance at selected cultural venues and events' level characteristics should not be merged on to the Person file.
IDENTIFIERS

There are unique identifiers for every record on each of the four files. As shown below, these identifiers are provided in a hierarchical order.
  • Household = ABSHID
  • Family = ABSHID, ABSFID
  • Person = ABSHID, ABSFID, ABSPID
  • Households have a household identifier, families have a family identifier (and the identifier of the household to which they belong is also given), and persons have a person identifier (and the identifier of both the family and household to which they belong is also given).
The household identifier (referenced on the file from the item labelled ABSHID) is a unique fourteen-character random identifier. For family records, each family within a household is numbered sequentially (within the item ABSFID). The same is true for person records, that is, each person in a household is numbered sequentially (within the item ABSPID). As a result of these arrangements, each person can be uniquely identified through the combination of their household and person identifiers. And, each family can be uniquely identified through the combination of their household and family identifiers. As well as uniquely identifying all units, the identifiers are vital for merging files or copying attributes of interest from one file to another, for associated units. For example, a Family level variable such as 'Family composition' can be copied to all the Persons within the family.SPECIAL CODES

For some data items, certain classification values have been reserved as special codes and must not be added as if they were quantitative values. These special codes generally relate to data items such as income. For example, code 9999999998 for the data item 'Weekly household income from all sources - parametric', refers to income 'Not known or not stated'.

Furthermore, most data items included on the CURF include a 'Not applicable' category. The classification value of the 'Not applicable' category and other special codes, where relevant, are shown in the CURF data item list (see Data Item List section at the end of this Chapter).MULTI-RESPONSE FIELDS

There was one multi -response question on the FCS 2009-10 where respondents could provide one or more responses. On the CURF, each response category for this 'multi-response question' (or data item) is treated as a separate data item. These data items have the same general data item identifier (SASName) but are each suffixed with a letter – A for the first response, B for the second response, C for the third response, D for the fourth response and so on .

The multi-response data item is 'Types of in-kind contributions received by family for resident children in last 12 months' and has 10 response categories (with a general SASName of INKRDCF – See the last item on the 'Family Level' tab of the data item list). Consequently, 10 data items have been produced - INKRDCFA, INKRDCFB, INKRDCFC, INKRDCFD, INKRDCFE, INKRDCFF, INKRDCFG, INKRDCFH, INKRDCFI, INKRDCFJ.

Each data item, will have a corresponding value against the relevant response with the remaining values corresponding to a 'Null response'. The 'Null response', generally a value of 0 or 00, is a default code and should be ignored. In the case of the J position (i.e. ALLFACJ), this also includes the 'Not applicable' category (generally a value of 9 or 99) which comprises those persons who were not asked the particular question.

It should be noted that the sum of individual multi-response categories will be greater than the population or number of people applicable to the particular data item as respondents are able to select more than one response. Multi-response data items can be identified in the data item list as SASNames followed by a range of letters in brackets; for example, INKRDCF (A-J). GEOGRAPHY

To enable analysis at a regional level, each record on the CURF contains a state/territory identifier (STATECF) and two sub-state identifiers – Capital city/balance of state (AREAURHH) and Remoteness structure (REMOTECF). The AREAURHH geographic data item has two output categories – Capital city and Balance of state. Only the capital city statistical divisions (as defined in the Australian Standard Geographical Classification ASGC (cat. no. 1216.0)) of the six states are included in the Capital city category. All other regions in Australia, including the territory capitals Darwin and Canberra, are classified to the Balance of state category.

Conditions of Use of Geographic Data Items

To provide CURF users with greater flexibility in their analyses, the ABS has included several sub-state geography data items (as described above) on the Expanded CURF.

Conditions are placed on the use of these items. Tables showing multiple data items, cross tabulated by more than one sub-state geography at a time are not permitted due to the detailed information about small geographic regions that could be presented. However, simple cross-tabulations of population counts by sub-state geographic data items may be useful for clients in order to determine which geography item to include in their primary analysis, and such output is permitted.