DataLab Clearance
Request and enquire about DataLab input, output and transfer clearance.
Apply the output rules to your analysis.
Request output clearance
To request output clearance, use the clearance request tile in the myDATA portal.
The ABS must approve and clear all DataLab outputs before you can access them outside the DataLab. Do not copy or remove anything (e.g. data, code, notes) from DataLab yourself. Output requests can take up to 2 weeks to complete.
Clearance Request best practice:
- Apply all necessary output rules to your analysis and provide evidence of this.
- Create well-organized and clearly described files for a quicker and more efficient clearance process.
- Request only what you need to ensure a more streamlined process.
- Do not include any counts or data from DataLab in your emails with the ABS.
See myDATA for more information.
Request input clearance
To request input clearance, use the clearance request tile in the myDATA portal.
Users can request the addition of aggregate data, concordances, supporting material or statistical code to an existing DataLab project.
We will not add the following to the DataLab:
- Names of people or businesses.
- Addresses or specific location coordinates (longitude and latitude).
- Large amounts of free text where it is not possible to check for names and other identifying information.
See myDATA for more information.
Request transfer clearance
To request transfer clearance, use the clearance request tile in the myDATA portal.
Users can move code and files that do not contain data between DataLab projects. Ensure there is no counts or IDs in the files, and all associated log or comment files.
For data files, provide context, the names of the data products used, and a description for each clearance file requested, including the population scope and definitions of variables. Follow all output rules and provide supporting evidence if required.
See myDATA for more information.
Enquire
To enquire about the DataLab clearance process, or how to apply output rules, use the clearance request tile in the myDATA portal.
See myDATA for more information.
The most common types of analysis are listed below along with the applicable rules for output. Other output types will be assessed based on similar principles.
Output type | Applicable rules |
---|---|
Frequency tables (counts, percentages) | Rule of 10 Group disclosure |
Magnitude statistics (means, sums, ratios) | Rule of 10 Group disclosure Dominance |
Quantiles (percentiles, medians) | Minimum contributors for quantiles |
Minimums, maximums, ranges | Minimum contributors for quantiles |
Models including regressions | Degrees of freedom Model-specific rules |
Charts (graphs, plots and histograms) | Chart clearance |
Microdata | Not appropriate for output |
Synthetic microdata | Not appropriate for output |
The rule of 10 refers to the minimum number of contributors required for each cell or statistic. The underlying (unweighted) count of observations must meet this threshold, and evidence must be provided.
If multiple tables are produced, differences of less than ten should not be able to be calculated through combining the tables.
The rule of 10 applies to most outputs including counts, percentages (both numerator and denominator), means, sums, ratios, and other statistics.
Options for making output safe include suppression of small counts, aggregation of categories or perturbation. If a cell is suppressed but it can be derived or estimated from other outputs, one or more additional values should be suppressed to protect the values of the the primary suppressed cell from being worked out.
See Data downloads for examples and options for treatment.
The dominance rule is designed to prevent the re-identification of units that contribute a large percentage of a cell's total value, which could in turn reveal information about individuals, households or businesses.
DataLab has a (1,50) and a (2,67) rule. This means that for any cell, the largest contributor cannot account for more than 50% of the total value and the largest two contributors cannot account for more than 67% of the total value.
Where a variable can take both positive and negative values, the negative values should be replaced with absolute values before determining the largest contributors and the total. The largest absolute value is then divided by the sum of absolute values to determine if the (1,50) rule is met, and the sum of the two largest absolute values are divided by the sum of absolute values to check the (2,67) rule.
Similar to the rule of 10, in the case of the dominance rule failing and if a cell is suppressed but it can be derived or estimated from other outputs, one or more additional values should be suppressed to protect the values of the primary suppressed cell from being worked out.
Dominance must be checked if any mean, total or similar statistic is calculated for continuous or magnitude variables. It does not apply to counts.
See Data downloads for examples and options for treatment.
Group (or attribute) disclosure occurs when all or nearly all units that have one feature also have some other feature. This means that even when the individual units may appear protected based on other rules, a previously unknown attribute of a unit may be disclosed based on the attributes of the group. Group disclosure risk should be assessed when any cell contains more than 90% of total number of units in the row or column.
This rule applies to frequency tables. Whether group disclosure requires treatment depends on the sensitivity and nature of the output.
See Data downloads for examples and options for treatment.
Quantiles and other relative ranks must be based on a minimum number of contributors depending on the precision. Underlying unweighted counts should be provided when reporting quantiles in the outputs. For information on required contributors for quantiles, see the table below:
Quantile | Minimum contributors |
---|---|
Medians ( 0.50 ) | 10 |
Quartiles ( 0.25, 0.5, 0.75 ) | 20 |
Quintiles ( 0.2, 0.4, 0.6, 0.8 ) | 25 |
Deciles ( 0.1, 0.2, 0.3 ... 0.9 ) | 50 |
Vigintiles ( 0.05, 0.1, 0.15 ... 0.95 ) | 100 |
Percentiles ( 0.01, 0.02 ... 0.99 ) | 500 |
Minimums and maximums are generally unsafe to output. The following percentiles are safe options if the minimum contributors rule is satisfied:
- 1st and 99th percentiles
- 5th and 95th percentiles
- 10th and 90th percentiles
See Data downloads for examples and options for treatment.
Models and regressions are generally safe to output. However, overfitted models can pose a disclosure risk. All models and regressions must have a minimum of 10 degrees of freedom and evidence that this has been met should be provided.
The degrees of freedom are calculated by subtracting the number of parameters and other model restrictions from the total number of observations that contribute to the model.
See Data downloads for examples and options for treatment.
There are additional rules for specific model types.
For ordinary least squares regressions, the R-squared should be lower than 0.9. If the R-squared is higher than this, the constant may need to be suppressed to prevent predictions. This requirement does not apply to other models such as fixed effects or two-stage regressions.
Additionally, for ordinary least squares regressions with a continuous dependent variable and only categorical independent variables, the regression will approximate the tabular means. The addition of a continuous independent variable, or suppression of the intercept reduces the disclosure risk. Otherwise, apply the rule of 10 and dominance rules.
For survival curves, each step change in the survival curve should represent at least 10 data subjects.
Correlation coefficients should be calculated based on a minimum of 10 contributors.
Gini coefficients are usually safe to output, and must be based on a minimum of 10 contributors.
For classification and regression trees, any underlying unweighted counts must meet the rule of 10.
For other models, please provide evidence that no estimates or parameters are derived from fewer than 10 underlying contributors and explain why the output is non-disclosive.
See Data downloads for examples and options for treatment.
All graphs, plots and other charts are subject to the output rules that apply to the underlying output type. The data used in the chart must be provided, accompanied by any relevant supporting evidence that it meets output rules.
Charts that plot characteristics of individual units or groups of fewer than 10 units will not be cleared.
See Data downloads for examples and options for treatment.