Microdata and TableBuilder: General Social Survey, Australia

An Expanded Confidentialised Unit Record File (CURF) from the 2014 General Social Survey

Introduction

This product provides a range of information about the release of microdata from the General Social Survey, Australia, 2014 including details about how to use the CURF and the TableBuilder. Data item lists and information on the conditions of use and the quality of the microdata are also provided, along with links to detail of survey definitions and methodology.

Microdata are the most detailed information available from a survey and are generally the responses to individual questions on the questionnaire or data derived from two or more questions. This level of detail is released with the approval of the Australian Statistician.

Available products

The following microdata products are available from this survey: 

Further information about these services, and other information to assist users in understanding and accessing microdata in general, is available from the Microdata Entry Page.

Before you apply for access, you should read and familiarise yourself with the information contained in the User Manual: Responsible Use of ABS CURFs and/or the User Manual: TableBuilder, depending on the mode of access you are interested in.

Apply for access

To apply for access to the CURF, register and apply in DataLab.

To apply for access to TableBuilder, register and apply in Registration Centre.

Further information on access steps can be found in How to Apply for Microdata on the ABS website.

Further information

Further information about the survey and the microdata products can be found in this product, including:

  • Detailed lists of data items for the CURF and TableBuilder are available in the Data downloads section
  • The Quality Declaration can be found in the left navigation menu
  • Abbreviations and Glossary relating to these products can be found in the Methodology page of the publication: General Social Survey: Summary Results, Australia

Support

For assistance in registering for access or to use these products:

CURF: contact Microdata Access Strategies at microdata.access@abs.gov.au or 02 6252 7714

TableBuilder: contact NIRS on 1300 135 070 or client.services@abs.gov.au

Data available on request

Data obtained in the survey but not presented in the CURF or TableBuilder may be available from the ABS, on request, as statistics in tabulated form.

Subject to confidentiality and sampling variability constraints, special tabulations can be produced incorporating data items, populations and geographic areas selected to meet individual requirements. These are available on request, on a fee for service basis. Contact the National Information and Referral Service on 1300 135 070 or client.services@abs.gov.au for further information.

Privacy

The ABS Privacy Policy outlines how the ABS handles any personal information that you provide to us.

Survey methodology

Information about the 2014 General Social Survey (GSS), including summary results, is available in the publication General Social Survey: Summary Results, Australia, 2014 (cat no 4159.0).

Detailed information about the survey including scope and coverage, survey design, data collection methodology, weighting, estimation and benchmarking, and the reliability of estimates can be accessed from the Explanatory Notes page of that publication. Lists of terms and definitions used in the 2014 GSS can be found under the Abbreviations and Glossary pages. The Data Item List, all published summary tables and the survey questionnaire can be accessed from the Downloads page.

File structure

Data available by level

The GSS 2014 microdata is available across four levels. Some of these levels have a hierarchical relationship:

  1. Household
  2. Person
  3. Voluntary work
  4. Access to services

Broadly, each level provides the following:

  • Household level - information about the household size and structure and household income details
  • Person level - all demographic and socio-economic characteristics of the survey respondents, and most of the information they provided
  • Voluntary work level - information about the characteristics of each episode of volunteering that the survey respondent described
  • Access to services level - information about the types of services that were difficult to access and the reasons why they were described as difficult

Weights and estimation

As the survey was conducted on a sample of households in Australia, it is important to take account of the method of sample selection when deriving estimates. This is particularly important as a person's chance of selection in the survey varied depending on the state or territory in which they lived. Survey 'weights' are values which indicate how many population units are represented by the sample unit.

There are two survey weights provided: a person weight (FINPRSWT) and a household weight (FINWTHH). These should be used when analysing data at the person and household level respectively.

Where estimates are derived, it is essential that they are calculated by adding the weights of person or households, as appropriate in each category, and not just by counting the number of records falling into each category. If each person's or households 'weight' were to be ignored, then no account would be taken of a person's or household's chance of selection in the survey or of different response rates across population groups, with the result that counts produced could be seriously biased. The application of weights ensures that:

  • person estimates conform to an independently estimated distribution of the population by dwelling type, age, sex, state/territory and part of state
  • household estimates conform to an independently estimated distribution of households by certain characteristics (e.g. by number of adults and children), state/territory and part of state rather than to the distributions within the sample itself.

Counting units and weight

The counting unit for level one is the household, for level two the person, for level three instances of volunteering and for level four services difficulty accessing. There is a weight attached to each level in order to estimate the total population of the respective counting unit. The weight on level one is the household weight, and the weight for levels two to four is the person weight.

What you count depends on the level from which you select the weight. A household level weight estimates the number of households with a particular characteristic. Likewise the weight included in the person level estimates the number of persons with the selected characteristics. Replicate weights have also been included and these can be used to calculate the standard error. For more information, refer to the Standard Errors section below.

Standard errors

Each record on the household level and person level also contains 60 replicate weights and, by using these weights, it is possible to calculate standard errors for weighted estimates produced from the microdata. This method is known as the 60 group Jack-knife variance estimator.

Under the Jack-knife method of replicate weighting, weights were derived as follows:

  • 60 replicate groups were formed with each group formed to mirror the overall sample (where units from a collection district all belong to the same replicate group and a unit can belong to only one replicate group)
  • one replicate group was dropped from the file and then the remaining records were weighted in the same manner as for the full sample
  • records in that group that were dropped received a weight of zero

This process was repeated for each replicate group (i.e. a total of 60 times). Ultimately each record had 60 replicate weights attached to it with one of these being the zero weight.

Replicate weights enable variances of estimates to be calculated relatively simply. They also enable unit records analyses such as chi-square and logistic regression to be conducted which take into account the sample design. Replicate weights for any variable of interest can be calculated from the 60 replicate groups, giving 60 replicate estimates. The distribution of this set of replicate estimates, in conjunction with the full sample estimate (based on the general weight) is then used to approximate the variance of the full sample.

To obtain the standard error of a weighted estimate y, the same estimate is calculated using each of the 60 replicate weights. The variability between these replicate estimates (denoting y(g) for group number g) is used to measure the standard error of the original weighted estimate y using the formula:

\(SE(y)=\sqrt{({59}/{60})\sum \limits_{g=1}^{60}(y(g)-y)^2}\)

where

\(g\) = the replicate group number

\(y(g)\) = the weighted estimate, having applied the weights for replicate group \(g\)

\(y\) = the weighted estimate from the sample.

The 60 group Jack-knife method can be applied not just to estimates of the population total, but also where the estimate y is a function of estimates of the population total, such as a proportion, difference or ratio. For more information on the 60 group Jack-knife method of SE estimation, see Research Paper: Weighting and Standard Error Estimation for ABS Household Surveys (Methodology Advisory Committee), July 1999 (cat. no. 1352.0.55.029).

Use of the 60 Group Jack-knife method for complex estimates, such as regression parameters from a statistical model, is not straightforward and may not be appropriate. The method as described does not apply to investigations where survey weights are not used, such as in unweighted statistical modelling.

Using the TableBuilder

About the TableBuilder

For general information relating to the TableBuilder or instructions on how to use features of the TableBuilder product, please refer to the User Manual: TableBuilder (cat. no. 1406.0.55.005).

The TableBuilder dataset contains all the data applicable to the General Social Survey (GSS) topic. Information on the structure is provided in the File Structure section.

Counting units and weights

Weighting is the process of adjusting results from a sample survey to infer results for the total population. To do this, a 'weight' is allocated to each record. The weight is the value that indicates how many population units are represented by each sample unit.

Both person and household estimates can be obtained from the General Social Survey TableBuilder. Each type of estimate uses a different weight (or 'Summation Option') and it is essential that the correct one is selected when specifying tables. Weights are selected from the Summation Options, as shown below;

Weights being selected from summation options in TableBuilder

Generally, as the Person level relates to people, a person weight is attached in the Summation Options. Similarly, as the Household level relates to households, a household weight is attached.

However, the default weight when producing any table using the GSS TableBuilder is the household weight, which is automatically applied to any table being generated. If generating a table from the Person level, the weight will usually need to be changed. A weight shown in bold, such as in the image above, indicates the weight being used in the table. Placing a tick in a 'Sum' tick box and then adding it to a row or column in the table will select a different weight.

The following table summarises the weights recommended for use with each of the levels:

LevelSummation option weightsUnit of measure
Household levelHousehold weight (1A)Households
Person levelPerson weight (5A)Persons
Voluntary work levelVoluntary work weight (6C)Instances of volunteering
Access to services levelAccessing services weight (8E)Services had difficulty accessing

Continuous data items

TableBuilder includes a number of continuous variables which can have a response value at any point along a continuum. Some continuous data items are allocated special codes for certain responses (e.g. 998 = 'Not applicable'). When creating ranges in TableBuilder for such continuous items, special codes will automatically be excluded. Therefore the total will show only 'valid responses' rather than all responses (including special codes).

For example:

The following shows the tabulation of the data item 'Age of youngest child in household'. The continuous values of the data item are contained in the 'A valid response was recorded' row. To show the actual continuous values in a table, a range must be created.

Tabulation of the data item 'Age of youngest child in household'

Here is the same table with a range applied for the continuous values. Note that the households with a "Not applicable" response no longer contribute to the table.

Tabulation of the data item 'Age of youngest child in household' with a range applied for the continuous values

Any special codes for continuous data items are listed in the Data Item List.

Adjustment of cell values

To minimise the risk of identifying individuals in aggregate statistics, a technique is used to randomly adjust cell values. This technique is called perturbation. Perturbation involves small random adjustment of the statistics and is considered the most satisfactory technique for avoiding the release of identifiable statistics while maximising the range of information that can be released. These adjustments have a negligible impact on the underlying pattern of the statistics. After perturbation, a given published cell value will be consistent across all tables. However, adding up cell values to derive a total will not necessarily give the same result as published totals. The introduction of perturbation in publications ensures that these statistics are consistent with statistics released via services such as Table Builder.

Zero value cells

Tables generated from sample surveys will sometimes contain cells with zero values because no respondents that satisfy the parameters of the cell were in the survey. This is despite there being people in the population with those characteristics. That is, the cell may have had a value above zero if all persons in scope of the survey had been enumerated. This is an example of sampling variability which occurs with all sample surveys. Relative Standard Errors cannot be generated for zero cells. Whilst the tables may include cells with zero values, the ABS recommends that TableBuilder clients do not use these data.

Multi-response data item

One of the survey's data items allows respondents to provide more than one response. This is referred to as a 'multi–response data item'. For this data item respondents can report all types of cultural venues or events attended in the last 12 months.

When a multi-response data item is tabulated, a person is counted against each response they have provided (e.g. a person who "visited a public library" and "attended a movie theatre" will be counted one time in each of these two categories).

As a result, each person in the appropriate population is counted at least once, and some persons are counted multiple times. Therefore, the total for a multi-response data item will be less than or equal to the sum of its components.

Tabulation of multi-response data item

Using the CURF

About the CURF

The 2014 GSS Expanded CURF contains four separate record level files which are described in the File Structure document on the summary tab. Subject to the limitation of sample size, the data classifications used and the conditions of use, it is possible to interrogate the data, produce tabulations and undertake statistical analyses to individual specifications.

The data included in the CURF are released under the provisions of the Census and Statistics Act 1905. This legislation allows the Australian Statistician to release unit record data, or microdata, provided this is done "in a manner that is not likely to enable the identification of a particular person or organisation to which it relates." Accordingly, there are no names or addresses of survey respondents on the CURF and other steps, including the following list of actions, have been taken to protect the confidentiality of respondents:

  • Excluding some data items that were collected.
  • Applying value ranges, collapses or top-coding to some variables.
  • Changing some demographic characteristics on a number of person records.

As a result, aggregated data obtained from the CURF will not exactly match estimates previously published in General Social Survey: Summary Results, Australia, 2014 (cat. no. 4159.0). Information about the impact of confidentialising actions on the CURF and comparison to published estimates for key populations can be found in Table one below.

Table 1 shows the change to the estimated population of persons aged fifteen years and over, and population estimates of persons aged fifteen years and over by state or territory, as previously published and from the CURF. It can be seen that, proportionally, the largest impact of the confidentialising process is in relation to ACT, where estimates changed by less than one percent.

Change of population estimates for people aged fifteen years and over due to record masking, by state
State or territoryPublishedCURF% change
'000'000
New South Wales5 967.45 963.0-0.1
Victoria4 682.64 683.60.0
Queensland3 660.53 662.70.1
South Australia1 347.51 344.6-0.2
Western Australia1 973.01 978.20.3
Tasmania409.7410.20.1
Northern Territory140.5140.60.0
Australian Capital territory303.3306.00.9
Total18 486.018 488.80.0

Steps to confidentialise the datasets made available on the CURF are undertaken in such a way as to ensure the integrity of the datasets and optimise the content, while maintaining the confidentiality of respondents. Intending purchasers should ensure that the data they require at the level of detail they require are available on the CURF; data obtained in the survey, but not contained on the CURF may be available in TableBuilder or in tabulated form on request. The Data Item Lists document contains information about the list of data items, which is available as an Excel spreadsheet in the Data downloads section.

Record counts

Table 2 shows the number of records on each level for the CURF dataset.

Counting units and number of records, by level
LevelCounting unitNumber of records
Household levelHouseholds12 932
Person levelPersons12 932
Volunteering levelInstances of volunteering15 475
Access to services levelServices had difficulty accessing15 198

Identifiers

There is a series of identifiers that can be used on records at each level of the file.

File level identifiers

The identifiers ABSHID, ABSPID, ABSVID, ABSDID appear on all levels of the file (as they are needed to create a hierarchical CSV file). Where the information for the identifier is not relevant for a level, it has a value of 0.

Each household has a unique twelve digit random identifier, ABSHID. This identifier appears on the Household level and is repeated on every other level. The Voluntary Work and Difficulty Accessing Service Providers episode levels are children of the Person level, and therefore the unique identifier is comprised of the Household, Person and episode level. The composition of identifiers for each level is outlined below:

  1. Household = ABSHID
  2. Person = ABSHID, ABSPID
  3. Voluntary Work = ABSHID, ABSPID, ABSVID
  4. Difficulty Accessing Service Providers = ABSHID, ABSPID, ABSDID

Copying information across levels

Example STATA code

Example SPSS code

Multi-response items on the CURF

A number of questions included in the survey allowed respondents to provide one or more responses. Each response category for one of these 'multi-response questions' (or data items) is basically treated as a separate data item. On the CURF, these data items have the same general data item identifier (SASName) but are each suffixed with a letter – A for the first response, B for the second response, C for the third response, D for the fourth response and so on.

For example, the multi-response data item 'Long term health condition by type of condition' (with a general SASName of LTHCOND – see data item list), has twenty-one response categories. Consequently, twenty-one data items have been produced - LTHCONDA, LTHCONDB, LTHCONDC and so on.

Each data item in the series (i.e. LTHCONDA -- LTHCONDU) will have two response codes: A 'Yes' response (for the first in the series (code 1), for the second in the series (code 2) etc.) or a 'Null' response (code 0) indicating that the response was not relevant for the respondent. The last data item in the series will represent a 'Not Applicable' response (i.e. value of last character in series) which comprises the respondents not asked the questions (e.g. LTHCONDU with values of 0 or 99).

It should be noted that the sum of individual multi-response categories will be greater than the population or number of people applicable to the particular data item as respondents are able to select more than one response. Multi-response data items can be identified in the data item list where the words <multiple response> appear next to the data item name.

CURF data files

The 2014 expanded CURF can be accessed via the RADL, and is available in SAS, SPSS and STATA formats. The CURF comprises the following files:

SAS files

SPSS files

STATA files

Information files

Data item list

The Data item list contains all the data items, including details of categories and code values, that are available on the CURF.

Formats file

This file is a SAS library containing formats.

Frequency files

Household (HH)

A file containing documentation of the Household level data. Data item code values and category labels are provided with weighted household frequencies of each value. This file is in plain text format.

Person (PER)

A file containing documentation of the Person level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.

Voluntary Work (VOL)

A file containing documentation of the Voluntary Work level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.

Difficulty Accessing Service Providers (DASP)

A file containing documentation of the Difficulty Accessing Service Providers level data. Data item code values and category labels are provided with weighted person frequencies of each value. This file is in plain text format.

Conditions of use

User responsibilities

The Census and Statistics Act includes a legislative guarantee to respondents that their confidentiality will be protected. This is fundamental to the trust the Australian public has in the ABS, and that trust is in turn fundamental to the excellent quality of ABS information. Without that trust, survey respondents may be less forthcoming or truthful in answering our questionnaires. For more information, see 'Avoiding inadvertent disclosure' and 'Microdata' on our web page How the ABS keeps your information confidential.

CURF data

The release of CURF data is authorised by clause 7 of the Statistics Determination made under subsection 13(1) of the Census and Statistics Act 1905. The release of a CURF must satisfy the ABS legislative obligation to release information in a manner that is not likely to enable the identification of a particular person or organisation.

This legislation allows the Australian Statistician to approve release of unit record data. All CURFs released have been approved by the Statistician. Prior to being granted access to CURFs, each organisation's Responsible Officer must submit a CURF Undertaking to the ABS. The CURF Undertaking is required by legislation and states that, prior to CURFs being released to an organisation, a Responsible Officer must undertake to ensure that the organisation will abide by the conditions of use of CURFs. Individual users are bound by the undertaking signed by the Responsible Officer.

All CURF users are required to read and abide by the conditions and restrictions in the User Manual: Responsible Use of ABS CURFs. Any breach of the CURF Undertaking may result in withdrawal of service to individuals and/or organisations. Further information is contained in the Consequences of Failing to Comply web page.

TableBuilder

The release of data to TableBuilder is authorised by Section 12 of the Census and Statistics Act 1905.

In accordance with the Census and Statistics Act 1905, the ABS must ensure that information is not released in a manner that is likely to enable identification of a respondent. Consequently, the output from TableBuilder are subject to a confidentiality process prior to release. The ABS aims to achieve a careful balance between maintaining a sufficient level of confidentialisation and ensuring the utility and high quality of statistical output.

All registered users of TableBuilder agree to abide by the Terms and Conditions set out and any future conditions that are notified to registered users.

Conditions of sale

All ABS products and services are provided subject to the ABS Conditions of Sale . Any queries relating to these Conditions of Sale should be emailed to intermediary.management@abs.gov.au.

Price

Microdata access is priced according to ABS Pricing Policy and Commonwealth Cost Recovery Guidelines. For details refer to ABS Pricing Policy on the ABS website. For microdata prices refer to the Microdata prices web page.

Apply for access

To apply for access to the CURF, register and apply in DataLab.

To apply for access to TableBuilder, register and apply in Registration Centre.

Further information on access steps can be found on the How to Apply for Microdata on the ABS web site.

Australian universities

The ABS/Universities Australia Agreement provides participating universities with access to a range of ABS products and services. This includes access to microdata. For further information, university clients should refer to the ABS/Universities Australia Agreement web page.

Further information

The Microdata Entry page on the ABS website contains links to microdata related information to assist users to understanding and access microdata. For further information users should email microdata.access@abs.gov.au or telephone (02) 6252 7714.

Data downloads

Data files

Previous releases

 TableBuilder data seriesMicrodataDownloadDataLab
General Social Survey, 2010 Basic microdataDetailed microdata
General Social Survey, 2006 Basic microdataDetailed microdata
General Social Survey, 2002 Basic microdataDetailed microdata

History of changes

Show all

Quality declaration

Institutional environment

Relevance

Timeliness

Accuracy

Coherence

Interpretability

Accessibility

Previous catalogue number

This release previously used catalogue number 4159.0.30.004

Back to top of the page