At the Data Processing Centre it's full steam ahead
In an otherwise unremarkable office building in the heart of the Melbourne CBD, over 800 people are currently hard at work producing the best quality Census data ever.
The Data Processing Centre, or DPC, has been humming with activity since August 2006, when the first trickle of Census forms returned by the public quickly became a flood. It was all hands on deck to check the initial rush of forms. Even senior management rolled up their sleeves to assist with the 'Precapture' process.
The DPC is housed in what was formerly the headquarters of a major bank, and we've found the massive bank vaults in the basement ideal for the secure storage of Census forms. With over 10 million forms, keeping track of their movement from storage to processing and back again is a challenge. To assist in this stage of the processing, a barcode scanning system was installed to control the flow of forms between the many different areas of the building.
In preparation for the scanning stage the forms needed to be sliced and diced. As the Census forms are made up of 18 bound and stapled pages each form has to be guillotined into single sheets and then sent to the scanning floor for the next stage in the processing.
Scanning
Working split shifts the scanning operators have processed the 10 million forms and well over 100 million individual sheets in just a few short months. The scanning machines can read both sides of a page at once and instantly create images. These images can be cut down into individual questions or image 'snippets', eliminating the further need to physically move the forms around the building. Once scanned, the forms are sent back to storage and are unlikely to be touched again until the end of processing when they are pulped and recycled under strict ABS supervision.
The responses on the forms are captured using an Optical Mark Recognition system which detects the marks the person has made when answering specific tick box questions. Questions requiring hand written text have the responses recognised and processed using Intelligent Character Recognition technology. This technology can largely cope with the differences in handwriting across respondents. In the few instances where handwriting cannot be read automatically by the system, responses go through a process of 'repair', where operators make manual corrections.
eCensus
If a respondent chose to use the eCensus and filled in their form online, precapture, guillotining, scanning and text recognition were not needed. This has generated major savings in both time and money. As more people choose the eCensus option in future Censuses the results could potentially be released earlier.
Coding
While scanning and recognition is taking place other staff are hard at work ensuring that there are no inconsistencies between the dwellings and persons counted. For example, if there were four people in a household, two filled in a household form, one asked for a separate personal form for privacy reasons and the other used the eCensus, the information must be drawn together into one dwelling record. This process ensures people are only counted once and attributed to the same dwelling.
The majority of staff at the DPC work as coders. Responses to the questions on the Census are 'coded' to a particular category within a classification, or a framework that allows the grouping of similar responses. For example, responses to the question on country of birth of 'New Guinea, 'PNG' and 'Papua Niugini' would all be coded to the category for Papua New Guinea.
Up to 90% of responses are coded automatically while the remaining 10% are sent to an operator to resolve.
Coding of basic demographic variables - sex, age, marital status and religion as well as address (both where you usually live and where you were on Census night) is virtually complete, with staff keeping a close eye on the quality of data. Coding of the more complex questions, such as occupation, industry and educational qualifications is currently underway, along with the coding of address data to a new micro-level of geography, Mesh Blocks.
Time Capsule Project
For those people who answered "yes" to the Census Time Capsule option, their name identified information on the Census form is now being microfilmed and will be kept securely for 99 years by the National Archives of Australia. This will be a huge bonus to historians and genealogists alike. As an extra security measure, special temporary dark rooms will be built at the DPC to process the microfilms. This eliminates additional steps in the chain of custody.
To keep an 800 strong temporary workforce happy, healthy and productive requires support. This includes recruitment, occupational health and safety, staff support, payroll, general administration and the huge effort of IT staff to keep 800 computers running. With only a few short months to go before release you can be assured that the staff at the DPC are busy getting the data ready so that it can be released into the community.