How to Fill In Your Census Form without Lockheed Martin Profiting (long version)

Blog by PN

(Updated as at 18-03-2011)

US Arms Manufacturer Lockheed Martin has the contract for the 2011 UK Census in March this year.

The arms manufacturer Lockheed Martin US makes Trident nuclear missiles, cluster bombs and fighter jets and is involved in data processing for the CIA and FBI. It has provided private contract interrogators for the Abu Ghraib prison in Iraq and Guantanamo Bay. Lockheed Martin has the UK Government contract to collect the process the data for the 2011 census in March. (Observer, 20 February 2011)

If you do not complete the census form and answer all the questions (except “religion”), (or return this information on line) you could get fined £1000 and a get criminal record. The Green Party has, after some real soul searching, decided not to promote a boycott of the 2011 census after all because that could lead to further funding problems for local authorities. The census data are used to determine the financial needs of councils on the basis of the population data for their area.

WHAT YOU CAN DO

Lockheed Martin is in it for the money. A principled stance by you to boycott the census will not hurt them, could provide the British Government with £1000 of your money and will make life harder for local authorities. The rational approach would be to take part in the census but make processing your return as expensive to process as possible for Lockheed Martin. Make sure that processing your return costs Lockheed Martin more that they allowed for in their tender. Don’t let them make a profit from your census return but do help to provide the data your council needs for its Government grants.

If you don’t send in your form, Lockheed Martin will still get its money and just make a higher profit for less work.

This year, for the first time, you can make your census return on line. Do not do this, for an on-line return is the cheapest and easiest option for Lockheed Martin to process.

The value of Lockheed Martin’s 2011 census contract seems to be about £150 million. See the ONS (Office for National Statistics) press release on web page:

https://2011mc.census.gov.uk/index.php?module=documents&action=view&id=14

The census form consists of 32 pages. The contract includes the processing of about 39 million census forms. This is approximately £4 per census return. This figure includes all overheads and Lockheed Martin’s profit margin, so that the company will have priced the direct processing cost per form at a lower figure. To make money out of such a contract, the handling and processing of the forms will have to be a high speed and highly automated operation. Every minute longer spent on a form than Lockheed Martin has budgeted for, will reduce their profit on the contract. It is realistic to assume that this extra cost to Lockheed Martin would be in the region of £1 per minute of extra time spent on your form if all the overheads are taken into account.

Let’s assume that they plan, using their high speed computerised scanning and data capture technology, to process a form in, say, 5 minutes from receipt at their processing centre up to finished data capture. If your form is going to take, say, at least 15 minutes because it is a little awkward to deal with (possibly longer if supervisory level staff has to resolve queries and problems), then you will have reduced Lockheed Martin’s profit by approximately £10, if not more. You can make it extremely time consuming by very simple means.

THE CENSUS PROCEDURE

Some time in March you will receive a census form in the post – probably addressed to “the householder” or “the occupier”, which someone in your household is obliged to complete. (Remember: Don’t do it on line). This must be done after the census day of 27 March. The “census day” is meant to be a snapshot of the entire population on that particular day. The form must be returned by post as “soon as possible” after 27 March. The Government website says “If you have not returned your questionnaire by 6 April, a census collector may call after that time (possibly around the end of April) to offer you any help”. There is no particular deadline line for returning the form.

Do not provide convenient contact details when filling in your census form or on any other piece of paper relating to the census. After all, nobody can force you to possess a telephone or email. Paper correspondence is much more expensive. Alternatively, accidentally change a digit of your telephone number and ditto for an email address. Everybody makes minor clerical errors, that’s just human nature.

It is obviously not helpful to make use of the “census helpline” phone number on the front page of the form (call centres are horrible to deal with) or the “Text relay”.

Any queries, requests for “Individual questionnaires” or additional household questionnaires, etc are best addressed in writing (but not including your phone number!) to”

FREEPOST 2011 Census, Processing Centre, UK (i.e. the Lockheed Martin processing centre)

And if they are too slow in replying, get the matter chased up by writing to:

Glen Watson
Census Director
2011 Census
Office for National Statistics (ONS)
Government Buildings
Cardiff Road
Newport
South Wales

Inexplicably, the census form omitted to provide this address.

For other useful details, see the Government’s helpful Census 2011 Information website: www.2011.census.uk

The legislation which sets out the precise information requirements for England and Wales is found in “The Census (England and Wales) Order 2009” on:

http://www.legislation.gov.uk/uksi/2009/3210/contents/maden

The primary legislation is the Census Act 1920 (can easily be found on line), which also contains the bit about penalties for not complying.

YOUR PERSONAL DETAILS

There is no which lists everybody alive and where he or she lives in the UK. The databases which exist (electoral roll, television licensing, Inland Revenue, National Insurance, DHSS, NHS, DVLA, etc.) are not comprehensive and are all in incompatible formats. There is a reasonably comprehensive database of all postal addresses, with post codes, but this data base contains no people information at all. To capture the whole country, the census will no doubt have to rely mainly on the postal addresses data base. Checking whether information people provide on their forms is accurate or true could be done on the basis of some small random samples. It would, however be extremely expensive and an administrative and logistical nightmare to carry out such checks on a big scale and such an enterprise would probably violate data protection legislation if various other data bases are used. It is a moving target for whilst you are collecting the information, it changes all the time through births and deaths, people moving house, and so on. Cross checking against these diverse other data bases is by no means easy for if there is a discrepancy, it may not be obvious which of the databases you are comparing contains the error.

This is why the Government is honest when it calls the census “a snapshot” taken on one particular day, 27th March. It is the best which can be done using the census method without a comprehensive and continuous updating structure. A census is, for example, in general not an effective procedure for a permanent people registration system such as identity cards. Its usefulness will only last for a few years as statistical survey of the country. A census of this type is not a “big brother” project.

DATA PROTECTION

It is reasonable to assume that your personal data on census forms will be safe. The Government would run into the most enormous problems if they were not. If you are not convinced, you could use the traditional method to track down misuse of your personal data by making a very small change to your name. E.g. (accidentally) change one letter, or add or delete one. If you make this particular change on no other document, you will know the source of the data protection failure and can take the necessary action.

Some people think that the data are not safe because Lockheed Martin is a US company and the “Patriot Act” applies. Whatever the UK Government assurances on this point, the practical fact is that this data base will be of limited interest to the US Government because a) it is guaranteed to be inaccurate for all sorts of reasons (hardly any checking will take place – that would be far too expensive and time consuming) b) the detailed people information will get out of date quickly and will not be updated and c) those people who are of real interest to state agencies will easily evade being recoded by the census anyway – this is a really easy thing to do, e.g. by making convincing looking fake entries which are 99.999% sure to remain unchecked.

MAILING THE CENSUS FORMS

The census form will arrive though the post and is to be returned through the post. A post-free return envelope will be enclosed. This envelope will have a window which is intended to match a large bar code printed on the form. The Royal Mail will scan these bar codes, without opening the envelopes and will then forward them to Lockheed Martin’s “Data Capture Centre” for processing. If, for whatever reason, a form’s bar code cannot be read by the Royal Mail, it will have no option but to forward that form also to the Data Capture Centre, for that will be the only place where the envelopes can be opened under proper data protection safeguards.

Each page of a census form also has have its own unique identifying bar code (on one side of each page) and a page-number barcode on every page.To avoid confusion, the term “outer bar code” will be used for the bar code to be scanned by the Royal Mail, and “inner bar codes” for the other ones.

This aspect of the operation is described in the downloadable newsletter “Census Talk no. 4” (with a good picture) on web page:

http://www.ons.gov.uk/census/2011-census/news-and-events/census-talk

Some interesting trial results of this system are found on “Census Talk – Special Issue” on the same web page.

CHECKING THE RETURN OF FORMS

The scanned returns by the Royal Mail will be forwarded to the QT (Questionnaire Response Tracking System) and matched to the mailing addresses list. This way, the census collectors’ management will know which forms have not (yet) been returned. (This information will be wrong for those forms of which the Royal Mail could not scan, of course, unless the Data Capture Centre acts promptly on those forms).

Non-returns (or assumed non-returns) will be followed up selectively by the census collectors, prioritising their efforts on geographical areas where the rates of return are low. This is done for socio-statistical reasons. Without such prioritization, they could obviously also waste a lot of time on scattered unoccupied houses and flats, which occur randomly in all areas.

This procedure is described in greater detail in “Census talk – Special Issue”, downloadable from the same:

http://www.ons.gov.uk/census/2011-census/news-and-events/census-talk

It could happen that the outer bar code cannot be scanned because:

- The form was wrongly inserted in the envelope;

- A different envelope has been used;

- The outer bar code has been covered before the form was put in the envelope;

- Some or all of the outer bar code’s white spaces were filled in with black pen or otherwise obliterate.;

HOW LOCKHEED MARTIN PROCESSES THE CENSUS FORMS

Lockheed Martin will rely as much as possible on computer software, called “Data Capture and Coding System” (DCCS) which will scan your form and automatically enter the data. At this stage the “inner barcodes” are vital. You can see a picture of one of the machines doing this in action on Google Images, (search for “2011 census form”). You can also see pictures of scanning at work on Lockheed Martin’s own web-page

http://www.lockheedmartin.com/products/DRIS/index.html

(Processing U.S. census form in this case. Note the bar code the bottom the of the census form!)

and also in “Census Talk no.5” downloadable from web-page:

http://www.ons.gov.uk/census/2011-census/news-and-events/census-talk

The sequence of operations in the Data Capture Centre is described in detail the following sources:

http://www.lockheedmartin.co.uk/news/archive/40.html

http://www.lockheedmartin.co.uk/news/archive/41.html

and in “Census Talk no. 5”, downloadable from

http://www.ons.gov.uk/census/2011-census/news-and-events/census-talk

I urge you to study these 3 sources carefully.

The Lockheed Martin data capture process will have the following stages, each of which has its own weak points.

(0. Scanning of “outer bar codes” or otherwise registering receipt of the form, when this has not been done by the Royal Mail)

1. Opening the envelope and preparing the form for scanning;

2. Scanning the form into the computer database; (The computer “reads” the form)

3. The computer software assigns meaning to the scanned information, i.e. it decides which of the information which it reads is acceptable as census data and which is to be rejected in the form of an error message.

Any information which, somehow has not been read or generates error messages on the computer is keyed in manually.

4. Dealing manually with queries arising form incomplete or ambiguous scanning information.

5. Downloading the information into databases for statistical and other reports. Recording the form on microfiche and/or filing the paper form for storage (for at least 100 years);

Stage 1: Physical preparation

The form is out of its envelope. The spine is sliced off (it is in booklet form. It is checked for anything which might obstruct scanning, in 2 respects: a) visual obstructions to the scanner and b) factors which might make the paper feed mechanisms go temperamental (like in normal photocopiers).

a) could be things like post-it notes, loose bits of paper and other detritus, stains, obviously unreadable barcodes, etc.

b) could be of the form of additional staples, tears, folds, creases, spots of stickiness such as a marmalade spillage or a fragment of bluetack, improvised repairs of torn sheets with sellotape, additional pieces of paper glued to the side, etc.

The forms have become a pile of loose sheets, ready for scanning, except those for which it is already obvious that scanning will be unsuccessful.

At this stage, scanning means only: passing though the scanner and some computer “reading” can take place. Whether or not meaningful information can actually be successfully transferred from the form to the computer depends also on several other factors.

Scanning will take place at a rate of 15,000 double sheets per hour. Manually keying of data is vastly slower and much more expensive.

Stage 2. Computer scanning of the information on the form.

The bold little inner bar codes on each page are page numbers to scan for the compute. The fainter “wavy” barcode on one side of each page is the form’s unique identifier. Through the combination of these two inner barcodes, the pages of the form can be read in complete random order, even if different forms are all mixed up. Without these inner barcodes scanning will, probably, be entirely impossible (unless there is a facility on the scanning machine to manually type in any un-obliterated number codes printed beneath the barcodes – as in a supermarket – but that would very be slow and cumbersome (An attempt could, conceivably also be made in such cases to make use of the “Personal Internet Access Code” printed on the front page. (better obliterate that too?)

Without form identification and page number identification, scanning makes no sense, for there would be nowhere for the information to go to.

(Look for such stuff on every page). Bar codes can be rendered ineffective by neatly filling in some or all of the white gaps between the bars of with a black pen or entirely covering with stickers – do not use post-it notes for they are easily removed. Do not allow any complete horizontal strip (however narrow) of the complete barcode to remain. (Many people “blacked in” or obliterated bar codes to great effect on Poll Tax forms in 1989-1991 and greatly increased their processing costs). Make sure you don’t miss any other codes and serial numbers. They, and other codes of symbols, numbers, etc, are best entirely obliterated with black pen or stickers. Don’t miss any!

You sometimes hear that barcodes can be made unreadable by rubbing a candle over them. I think that this is an urban myth (they can be read through transparent plastic!) I have been unable to find any technical reference to the candle wax method.

Lockheed Martin has software to scan handwritten entries and ticked boxes on you form (at very high speed. first extensively used in the US census of 2000 and then in the UK census of 2001). Anything which obstructs the automatic scanning of the information and involves the need for a human intervention obviously considerably increases Lockheed Martin’s processing costs. Luckily, Lockheed Martin and the Office for National Statistics (NOS) have provided some helpful descriptions of its computer scanning system of census forms in non-technical language on the internet links given above. I urge you to read these

Look again closely at Lockheed Martin’s own picture on web page:

http://www.lockheedmartin.com/products/DRIS/index.html

It would appear that what is actually shown on this picture is the “stage 2” process of form identification scanning, using a BAR CODE on the form

If form identification using scanning of bar codes or other codes fails at the processing centre, the form will have to be checked in manually on a computer screen against its ADDRESS entry of the mailing data base. This will makes the whole identification process of that form an order of magnitude more time consuming. The postcodes in the mailing addresses would be very important in such cases a) to speed up de address search and b) to decide which one is the correct address option if the street name occurs several times in a town or city – especially large places like London. (People who believe that a particular letter or digit of their post code printed on their form is not the right one, should obliterate it firmly and write the one they think it should be instead – don’t miss any such post codes; there could be more than one.)

Stage 2: computer scanning of information entered on the form.

Lockheed Martin’s software can read ticked boxes and both lower case and upper case letters where each letter is written in its own little box. It looks for writing in places on the forms where you are expected to write

The bits to be filled in by you will be white on the form and the bits which the computer is not meant to read are coloured. (Anything which you write or tick in the coloured part of the page cannot be read by the computer and will have to be keyed in by hand).

A few examples of the form’s pages can be found on Google “images”. Search e.g. “uk census 2011 form”

If (God forbid!) you wrote something down all wrong, you could either crossed it out firmly, and write the information somewhere else with a helpful arrow to the place where it should have been written, or you could glue, sellotape or staple another piece of paper in the approximate place on top of the erroneous entry and write the correct information on it. In either case, the computers scanner will not be able to read the information and will refer it back to a human being to deal with.

The same applies to box ticking. There will many of boxes to tick. It is so easy to tick the wrong boxes in all the excitement. It is best to firmly cross it all out and write in the margin, or wherever there is some space, something like: “Sorry, it should have been this one”, with an arrow pointing in the approximate direction.

The text, to be printed by you in little boxes, one for each character, will be read by so-called “Optical Character Recognition” software (you can look this up on Wikipedia and follow links). The software cannot read joined up writing. (Writing which ignores the boxes, and/or which is joined up, cannot be read by the software).

The software reads each written character – each one in its own little box – and will then decide, using a statistical analysis, which letter or numerical digit is the best fit to the handwritten shape. It will have a kind of “pictures dictionary” of all the various ways people hand-write characters and numbers and will compare what it sees on the form with the dictionary at various levels of confidence. Lockheed Martin is very proud of the sophistication of its “optical character recognition” software and the extremely wide diversity of people’s writing it can read, but that could also be its weakness. (If you draw little pictures, random shapes or fantasy symbols or characters from other scripts in various unused boxes, the software may well try to read them and try to guess which letters or numbers have the closest resemblance to your little scribbles. It might produce unpredictable and rather odd prose as a result).

The software will probably have an in-built spell check and correction facility to provide alternatives to misspelled words (like normal word processing software). (It is likely to have problems if you leave out the spaces between words, especially when combined with spelling mistakes).

The lack of vertical symmetry in both the census pages’ barcodes and in pictures seen of the reading machines suggest strongly that the software cannot read “upside down” and pages will have to be fed in all the same direction.

Writing text upside down (i.e. rotate the form upside down when you write entries) is likely to be extremely effective, with the added advantage that it would also substantially slow down subsequent manual keying in, for the operator would be confused and would have to work from “bottom to top” for text, but from top to bottom for ticked boxes.

The web-link below is rather technical but gives a good insight in the economics of scanning in paper census questionnaires, and the trade-off between speed and accuracy:

http://www.documentimagingreport.com/Forms_processing_mystery.1534.0.html

Stage 3: assigning meaning to the information

Everything which the scanning software cannot read will produce a “no data” or “error” message and will time consuming require manual attention.

The software will also detect a) “contradictions” and b) “uncodable text”. Uncodable text is text which does not match any pre-set words in a “coding dictionary”. The purpose of this software feature is to prevent logical nonsense to be downloaded into the census database. It will in the first case refuse to accept them and in the second case require a re-definition to be keyed in manually.

A “contradiction” occurs for example if 2 contradictory boxes are ticked, or 2 boxes are ticked when the form instructs you to tick only one box.

a) “Contradiction” examples are: Tick both boxes “Male” and Female” (adding, if you wish, words like “undecided” or “it all depends”, etc. wherever you find space to write, to show that you are taking the question seriously and don’t just tick any old boxes), interchanging day and month in the date of birth boxes (the 12th> day of the 28th month) Similarly, tick contradictory boxes for religion, occupation, nationality, ethnic identity, etc.All such cases will generate an error message and will need to be examined by a human processor, who will have to decide what (if anything at all) can be keyed in manually to represent your answer to the question.

b) The case of “uncodable text”. There are two types of text in the software: “free text” and “coding text”. “Free text” would for example be the entries “first names” and “surname”. It is not possible to list in advance all the possible first names and surnames people could have in the UK. Just about every arbitrary piece of writing (i.e. every conceivable string of symbols, however random, but as long as the computer can read them) will therefore be accepted as a valid data in such a case.

“Coding text” is text for which the computer is programmed to assign every word it recognises as being in a pre-defined “dictionary”, to one particular choice of a limited number of categories. The census form will, as much as possible, encourage you to use words which occur in such pre-defined dictionaries. If the word(s) you use is/are not in the pre-set “dictionary”, a human operator will have to look at it and then have to decide, as a human decision, to which category your word(s) should be assigned. This “coding text” system will be used if you could not possibly have a ticking box for every option (such as occupation, nationality, etc), but the word(s) you write down as an answer to the question are expected to be meaningful to the type of question asked.

For example, in question 22 if you are German, and enter “German” as the answer to the question “nationality”, it will be scanned and coded to “Germany”. If you write instead “Bundesrepublik Deutschland” or ditto “Republica e Shqipërisë” if you are Albanian, or “The Realm of her Gracious Majesty Queen Elizabeth II” whilst ticking the “other nationality” box in question 15, such words and expressions will not be recognised as “coding text” and a human analyst will have to decide what they mean.

All this is best explained by quoting Lockheed Martin’s own description of the “Occupations” section of the census in its method description. Each census answer must in that case be coded to one of some 350 job codes in the “UK Standard Occupational Coding Index”:

“The system uses sophisticated software trained with thousands of examples of correctly coded responses to automatically recognise approximately 70% of the responses. Unrecognised responses are sent to highly trained operators to code. Coding is a difficult and expensive process (Emphasis added) but our automatic coding software combined with highly productive user tools makes it possible to code all of our data accurately with a small number of operators”.

For example, the words “doctor”, “general practitioner”, “surgeon”, “Dentist” etc, will all be assigned by the computer to a code which could perhaps be “medical/professional”. The software may well be sophisticated enough to recognise “tree surgeon” as belonging in a different occupational category, but what about the antiquated job title of “barber-surgeon”: would that be a hair dresser or medical practitioner?

In cases where the census form obviously wants to put you in some category or other by writing down one or more words instead of, or in addition to, box ticking, time consuming human decisions must be taken and manually keyed in whenever “coding text” is written in unusual or unexpected terminology. They are many way to do this. You could be an “oral surgeon” or write down “I repair people’s teeth”. Neither answer will be recognised by the software as “dentist”; or you could simply add random words: “(salad cream) office (snorkelling) manager”. Only a human processor can analyse responses of this kind and attempt to code them to an occupational category. This technique can be applied to all kinds of “coding” questions on the census form. Remember: “Coding is a difficult and expensive process”, says Lockheed Martin.

Religion: a mixed “box ticking” and”coding text” example:

This is optional (all other questions must be answered by law). Because of the computer scanning of text, it is not really helpful to decide that you are a Jedi Knight or that you worship the Flying Spaghetti Monster and nobody else is really bothered about it. Instead, tick for example a couple of contradictory religion boxes (e.g. “Jewish” and “Sikh”) and add something like “undecided” as coding text. The census designers will no doubt have forgotten to create a category for this eventuality but they will have to put down something. (Except, possibly, if you use additionally the “no religion” option in your selection of several boxes and the census has cleverly provided for the coding category “agnostic”). None of this matters in any way in real life, but it all takes time to process manually, costs Lockheed Martin money and provides paid employment. The same logic applies to other “category” questions. TIME = MONEY and EVERY LITTLE HELPS.

Questions which you find intrusive or which violate your privacy

It is not likely that people at the processing centre (who are only doing a job because they need the money) will be very interested in your principles of feelings about this (and if they are, they cannot do much about it). Refusing to answer such questions could, in principle, cost you £1000 and will make no difference whatsoever to Lockheed Martin. It will be more effective to tick a few random boxes and write some random stuff in the text sections, then cross it all out again, and write something like “I don’t understand this. Please explain” This will take up time to deal with in the processing centre. You cannot be fined for not understanding a question or for being confused by it and you have made the effort. It is hard to imaging that the Lockheed Martin processing centre would act on this, for it would be very expensive and an administrative nightmare to try to get back to people about such responses – especially if they did not provide convenient contact details (see above).

Stage 4: manually dealing with any answers which could not be read, or given meaning, by the computer.

It is likely that a a significant proportion of forms will in any case need some manual attention, because Optical Character Recognition is not 100% reliable, and there will be coding queries . At this manual stage 5, the “inner barcodes” are again vital to enable to manual data clerk to register the form quickly and link it to entries already made. Without those bar codes, a manual address search would first have to be made.

Stage 5: downloading the data, microfiching the forms and/or filing of the paper forms. This stage could, in principle, be influenced by Stage 5 below.

Possible Stage 6 Following up after you have sent the form.

It is easy to make a mistake or even to forget to answer a question – we are all human after all. No problem: just write to the processing centre (Addressed to “Census Processing Centre” in whatever place name you remember from the form) to tell them to put it right on your form. A considerable amount of clerical work could be involved.

It all depends at which of the above five stages your form is when your follow up letter arrives at the processing centre. If it is, for example, sent very soon after the form is sent off – if not at virtually the same time – the form could well be not even at stage 1 (the form is in stacks of mail not even opened) and it would be hard to find your form to add your letter to it. Someone would have to keep track of your letter and monitor when your form turns up to be processed. At each subsequent stage of the form your letter would have a different effect on the clerical work involved.

Since the information of the forms must be transferred to the computer with at least 98% accuracy, and Lockheed Martin must be able to demonstrate that they are achieving this accuracy, it is probably necessary for the processing centre to find you form to staple your letter to it, even if the information is already on the computer. If you supply a missing answer, keep a copy of your letter so that you can prove that you made a real effort to comply with your legal obligation to answer all questions.

NO NEED FOR OVERKILL

Life is short and there are more rewarding things to do with your time. You only need to choose a few of all those suggestions above to make your intervention an effective one.

No doubt, only a small minority of census forms will be “prepared” using the methods suggested above, but those forms will be randomly distributed amongst all the others (if the outer envelope carries no signs). Such randomness increases their effectiveness, for they unexpectedly interrupt the flow of the operation in its various stages.