Latest Channel 4 News:
Brown to hold Afghan discussions
Family die in Thanksgiving massacre
Three dead in cargo plane crash
Energy suppliers 'overcharging'
Charles visits flooded communities

How to get 7.5m records on two CDs

Updated on 21 November 2007

By Channel 4 News

Names, addresses, National Insurance numbers, bank details and more. Just how do you get all that on to two standard compact discs?

A viewer writes in response to the child benefit data leak:

"A CD holds about 700MB, and even if only 7.5 million people are affected ... that's only about 190 characters per record - barely enough to hold the name, address, National Insurance number, date of birth, relationship, bank details etc - even if it's compressed.."

It's a lot of data but according to a data storage expert Channel 4 News online spoke to it is quite possible to store it all on two standard compact discs holding between 700 and 750MB each.

And, yes, compression is the answer.

"Most data of that type can be saved as .CSV file [comma-separated values file format] or something similar which makes it easily importable and exportable," says Peter Groucutt, managing director Databarracks.

"Data resides in a digital format. It's made up of lots of ones and zeros but also contains a lot of space. By applying a simple algorithm - or mathematical equation - you can take a lot of repeated and unnecessary data out of the file."

"You'll find a compressed file goes down to as much as 90 per cent of its original size. So you could squeeze a 1GB file into 100MB of space on a disc. Or put it another way, the entire contents of the Encyclopædia Britannica could easily fit on one or two CDs."

Compression tools are commonly available on personal computer operating systems, such as Windows XP.

All of which assumes that the disks in question are CDs.

We could equally be talking about DVD drives which hold nearly 5GB of data. An optical disk, meanwhile, can store as much as 60GB. In which event, compression would be unecessary.

Disc-ography: a list of terms

Compression: Compression is the reduction in size of data in order to save space or transmission time.
Zipping: Zipping is the act of packaging a set of files into a single file or archive that is called a zip file. Usually, the files in a zip file are compressed so that they take up less space in storage or take less time to send to someone.
Algorithm: An algorithm is a procedure or formula for solving a problem. The word derives from the name of the mathematician, Mohammed ibn-Musa al-Khwarizmi, who was part of the royal court in Baghdad and who lived from about 780 to 850. Al-Khwarizmi's work is the likely source for the word algebra as well.
Source:whatis.com

Send this article by email

More on this story

Channel 4 is not responsible for the content of external websites.


Watch the Latest Channel 4 News

Watch Channel 4 News when you want

Latest Domestic politics news

More News blogs

View RSS feed

Queen's speech 2009

Snowcloud

Snowcloud: what was in the Queen's speech at a glance.

Blond redhead

Phillip Blond from the ResPublica think tank.

Is Phillip Blond the red under David Cameron's bed?

Cathy Newman on Twitter

cathynewman

what is the point of foundation trusts if basildon and thurrock can get away with blood-stained curtains/floors and soiled mattresses?

This week

Follow us

Snowclouds

See how many times a word is used in key speeches, and in what context.

The Freedom Files

Freedom Files

Revealed: the stories they didn't want to tell.

Making a FoI request?

Channel 4 News tells you how to unearth information.




Channel 4 © 2009. Channel 4 is not responsible for the content of external websites.