The Paper Data Storage Option

As programmers, we regularly work with text encodings. But there's another sort of encoding at work here, one we process so often and so rapidly that it's invisible to us, and we forget about it. I'm talking about visual encoding -- translating the visual glyphs of the alphabet you're reading right now. The alphabet is no different than any other optical machine readable input, except the machines are us.


This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2009/07/the-paper-data-storage-option.html

Paper backups… there’s an app for that.

Frist!

I wish someone would make a whole disk (like disaster recovery) version of paperbak…

I would love to say, see that ream of paper…yeah thats my backup.

Or in actuallity having a single pdf that contained everything would be supremely awesome

The reason you don’t use paper is because the acid in the paper makes everything disappear (relatively) quickly.

There’s a coding error on the IBM JCL punched card: should be no space between
SLINK,
and
TESTPGM=

That’s pretty cool information. Thanks Jeff! I never knew you could store so much on one of those image maps. Crazy!

This assumes either the software or the computer system is useable in a hundred years…

I’ve been wondering how efficient paper-based backup could be. That’s pretty impressive. ~Half a megabyte of uncompressed data is nothing to sneeze at. You could squeeze a whole lot of 7zip compressed plain-text documents into that space; easily an entire book on a single page. With formatting things get larger, but c’mon. An entire book on a single page.

The problem of course is making sure it’s still readable in a couple hundred years, but if you use decent paper and slap on a few cover pages or so documenting your algorithms I don’t see why it would be an issue.

Paper that lasts a century is easy enough to make or buy, but it’s definitely not the common photocopy paper you use by the ream every day. That stuff is acidic and the ink is also fragile, so it’s a race between the paper eating the ink and the ink decaying on its own. Give it 20 years and you won’t be able to move it without it falling apart either.

So as long as you’re using a high quality acid-free archive quality paper, your laser printer is in good condition (especially the seal roller) and you store the paper in a way that minimises exposure to air and moisture (making a thick stack is good), you should be fine for 100 years.

Personally, I’d be printing the coding scheme in human-readable text on every 100th page, and making very sure that each page is independently readable. So backing up your huge pr0n collection… not so much.

The post reminds me of the opening chapters of Code by Charles Petzold (http://www.amazon.co.uk/Code-Language-DV-Undefined-Charles-Petzold/dp/0735611319/ref=sr_1_1?ie=UTF8&s=books&qid=1249079981&sr=8-1)

Well, the intro was pretty cool. Human OCR is pretty awesome.

Storing data on paper… All kinds of problems: 1. Expensive. 2. Volatile (Paper that lasts needs to be bound, which adds more to #1). 3. Storage Space (as in actual room for the paper). 4. Security Problems.

Also, what terrible storage space! 500k per piece of paper? Let’s see, to store 1GB I would need… 2,000 sheets of paper. To back up my server, I need, 2 MILLION SHEETS OF PAPER (at least).

Although, that may be fun for some spy type stuff.

Actually, the longevity is likely better than you’d think. My company just went through 27 years of project archive boxes to reduce their volume. The oldest boxes had numerous program listings printed by dot matrix on “ordinary” grade line printer paper, and had been stored in “ordinary” grade banker’s boxes. There was some fading and some yellowing, but by and large everything that came out of the oldest boxes was about as legible is it had been when stored.

The biggest issue is longevity of the file format itself. Its probably best to store things in lowest-common denominator formats. Plain text where possible, for instance. Otherwise, formats that are widely documented such as JPEG and PDF are probably safe enough. For a less well known format (such as that collection of digital negatives in Nikon raw files) it might be a good idea to include a source kit to a program that can read the file.

Unless I’m being daft, but a page would equal less than half a MB?

“Or in actuallity having a single pdf that contained everything would be supremely awesome”

no comment.

Oops. The rats ate my backup.

Treekiller!!

This reminds me of the Rosetta Project run by the Long Now Foundation.

When it came time to make a long lasting archive of all of their data about human languages, they ended up going with essentially high precision bronze-age technology. Words engraved in metal.

They see your PaperBack and raise you StoneBack.

More on the Rosetta project here if you are in to that kind of thing.
http://rosettaproject.org/disk/concept/

Twibright Labs, which developed the RONJA (Reasonable Optical Near Joint Access) open-source system for open-space optical networking, has created a similarly open-source system called OPTAR for storing data on paper. The system stores 200KB on an A4 page, with error correction. The provide the following reasons for the technology:

  • Long-life storage. They point out that microfiche panels (which could be used to store quite a bit of data via an imagesetter) have an estimated life of 500 years in air conditioning, much longer than common data storage formats.
  • Legal requirements. The law requires that certain kinds of records be kept on paper (for example, notary journals and financial reports per Sarbanes-Oxley). OPTAR satisfies those requirements while storing data in a directly machine-readable format.
  • Inclusion of digital information on printed materials. The example of ringtones printed on paper is given, where a cellular phone camera could read the data.

Archival life and legal compliance seem the most compelling reasons, particularly the applications in satisfying corporate compliance law without sacrificing computer readability of paper records that can serve as backup.

They also mention usage in IP-over-avian-carrier implementations, per RFC1149 and RFC2549.

I like big butts and I cannot lie!