Skip Channel4 main Navigation

|Powered By Google


Skip to main content

Last Modified: 21 Nov 2007
By: Channel 4 News

Names, addresses, National Insurance numbers, bank details and more. Just how do you get all that on to two standard compact discs?

A viewer writes in response to the child benefit data leak:

"A CD holds about 700MB, and even if only 7.5 million people are affected ... that's only about 190 characters per record - barely enough to hold the name, address, National Insurance number, date of birth, relationship, bank details etc - even if it's compressed.."

It's a lot of data but according to a data storage expert Channel 4 News online spoke to it is quite possible to store it all on two standard compact discs holding between 700 and 750MB each.

And, yes, compression is the answer.

"Most data of that type can be saved as .CSV file [comma-separated values file format] or something similar which makes it easily importable and exportable," says Peter Groucutt, managing director Databarracks.

"Data resides in a digital format. It's made up of lots of ones and zeros but also contains a lot of space. By applying a simple algorithm - or mathematical equation - you can take a lot of repeated and unnecessary data out of the file."

"You'll find a compressed file goes down to as much as 90 per cent of its original size. So you could squeeze a 1GB file into 100MB of space on a disc. Or put it another way, the entire contents of the Encyclopædia Britannica could easily fit on one or two CDs."

Compression tools are commonly available on personal computer operating systems, such as Windows XP.

All of which assumes that the disks in question are CDs.

We could equally be talking about DVD drives which hold nearly 5GB of data. An optical disk, meanwhile, can store as much as 60GB. In which event, compression would be unecessary.

Disc-ography: a list of terms

Compression: Compression is the reduction in size of data in order to save space or transmission time.
Zipping: Zipping is the act of packaging a set of files into a single file or archive that is called a zip file. Usually, the files in a zip file are compressed so that they take up less space in storage or take less time to send to someone.
Algorithm: An algorithm is a procedure or formula for solving a problem. The word derives from the name of the mathematician, Mohammed ibn-Musa al-Khwarizmi, who was part of the royal court in Baghdad and who lived from about 780 to 850. Al-Khwarizmi's work is the likely source for the word algebra as well.
Source:whatis.com