The Paper Data Storage Option

This is a bad idea. Have you ever printed a stack of paper and come back to it 3 years later? All the ink has started sticking to the page above and when you pull it apart it looks pretty bad. I can’t imagine doing this!

Ah, paper. The perfect and longest lasting storage medium!

Well, I certainly know that Adobe and Monotype keep all their fonts on punched paper tape (with instructions on how to read the tape written in pencil at the top)! Highly efficient and you can easilly build a paper tape reader with nothing more than basic workshop skills. Knowing the speed and reading method (always simple) it is possible to reconstruct the binary file.

I also know that in the 90’s they stored all the source code for applications such as Photoshop and Pagemaker on paper tape as well (don’t know if this is still the case).

Makes a lot of sense. I have some inherited punched paper tape data from 40 years ago and its still in perfect condition. I have the reader (mechanically very fast if a bit noisy as it rattles through) and it still works. An RS232 port provides the output. I can read the data into a pc and write the raw data into a file on disk or process it further directly as a data stream.

There’s a lot more of this kind of thing going on still than you might realise.

I think the more important question is, what information would I like to store 50 or 100 years? Programs that are 20 years old are already out of date and basically useless. The important programs they will be left running and therefore be accessible.

“Modern DVDs produced with super cyanide (Tayto’s) or metal-stabilized Cyanine (TDK’s) have lifespans of approximately 70 years.”

According to that rosetta project site, paper can last for thousands of years.

“Similarly, the paperback solution faces a high media longevity, but no promise of media support. In fact, it presents the huge problem of being also dependent on the encoder/decoder availability in 100 years, probably demanding a legacy system just for the purposes of restore.” not a problem to sneeze at for sure. But the source code is available, and can be bundled with the data. NASA recently made available the assembler source code for its apollo guidance computers. If you set your mind to it, you could easily write an emulator to run it, or hand port it to some modern language.

“And then there is also the huge archiving problem it presents. Let’s not be shy here, you would need 8,000 A4 pages to match the capacity of a single 4GB DVD. And then you then need to ask, how long will it take to scan 8,000 pages so I can have my restore?”

Do you need your porn collection in 1000 years? I’m perhaps being overly modest, but I think I would struggle finding even 500MBs of stuff that I would want to last for that long. It would likely be source code, writing, and possibly drawings/paintings that are stored on paper in original form anyway. It’s not that much of an issue for me to personally curate and maintain my own data while I’m still alive, but I don’t expect anyone to care after I’m dead.

In any case, I don’t think libraries will be discarding their microfiche machine in favor of DVDs any time soon. Well, not the clever ones anyway.

Not sure what your argument is Breton. If it is making a case for paper backup, I have news for you. We invented computers.

Well, I’m not sure I really have much of an argument. Merely an open question: How much about our current history will still be around in 30, 50, 100, 500, or 1000 years? Are we entering a new dark ages? The longevity of the mediums we’re currently storing our personal histories into is unproven, because it has only existed for less than a century. We simply don’t know which parts of it will last. However, we already know from experience which parts certainly WON’T last: Magnetic records. We have found the diaries of people who lived in the year 500. in 1500 years time, will there be any evidence that you ever even existed? What will the people of the future be able to discover about you? How would they do it? How much effort would they need to go to in order to decode it?

I can’t recall anyone referring to CDC Display Code as “CDC” - it was always “Display Code.” Other 6-bit encodings like BCL (“Burroughs Common Language”) generally did go by their acronyms though.

There is still a TON of stuff around in EBCDIC though. No, not your Facebook ramblings but stuff like your bank accounts, driving record, criminal history, … and EBCDIC is far from an IBM mainframe only encoding but people don’t seem to know it. Heck, Windows supports EBCDIC yet at least for transcoding purposes, via Kernel32 no less (see MultiByteToWideChar with codepage 37 for one case).

ASCII was considered an interchange format, not something native for normal internal use.

“Well, I’m not sure I really have much of an argument. Merely an open question: How much about our current history will still be around in 30, 50, 100, 500, or 1000 years? Are we entering a new dark ages? The longevity of the mediums we’re currently storing our personal histories into is unproven, because it has only existed for less than a century. We simply don’t know which parts of it will last. However, we already know from experience which parts certainly WON’T last: Magnetic records. We have found the diaries of people who lived in the year 500. in 1500 years time, will there be any evidence that you ever even existed? What will the people of the future be able to discover about you? How would they do it? How much effort would they need to go to in order to decode it?”

Interesting questions indeed. I don’t think anyone has an answer. However the, allow me, “computer revolution” is not much different from the printing press revolution when a new information storage media was invented. I’d say it is safe to assume as we continue this path we will constantly reformulate our current technology bringing new and better way to archive information. We do not tend however to go back. Especially because information density keeps growing and old methods become incompatible with that growth. It’s hard to beat stone carving when it comes to longevity, and it’s conceivable to think a robot and a piece of software could laser carve more information on a lime stone the size of my post-it than this PaperBack thing could on 10,000 A4 pages. Still… we don’t see many of those around.

Storage media for long-term backup purposes is meant to be kept in protected environments that usually extends their life far beyond the announced material properties under normal conditions. In very good conditions, information on a DVD could last centuries. But even disregarding that possibility, it is widely acceptable and a common practice that when new better technology is invented and there is a port for that technology, all backups are converted into the new media. Much like paper has been being converted to digital format.

Not that other way around, eh.

“But even disregarding that possibility, it is widely acceptable and a common practice that when new better technology is invented and there is a port for that technology, all backups are converted into the new media. Much like paper has been being converted to digital format.”

You mean it’s a widely accepted belief that this happens? I think this is a vast overstatement of the reality of the situation. For instance, there’s piles of music that is available on vinyl records, and not available on CD, and never will be, because it’s not popular enough at this moment of transition. What if we change our minds later, and decide that music actually is interesting? Will it be too late to save it? Many of the records that will be lost will be rather interesting historical records of music in a particular time, as it was performed then.

There are piles and piles of films that were made in the 20th century, which requires some serious funding and efforts to be preserved. So much so that it must be demonstrated to have genuine historical and cultural value before it can be selected for preservation. We still haven’t recovered the entirity of the movie “metropolis”. There’s sections missing.

I think the ease with which you are able to copy a file on a computer has the effect of distorting your perspective about how difficult it actually is to preserve what’s important. How difficult it is to even decide what’s important. Some of the most tragic losses have happened because it didn’t seem important at the time.

Oh damn, I haven’t even mentioned the effect of DRM, or hard drive crashes. We’re already hearing about people losing valuable family photos to technical mishaps. You might comment that it’s their fault for not backing up- But it is just much harder to crash a paper photo album. They don’t have a tendency to spontaneously combust every 3-4 years like hard drives do.

We are talking of backups. Not preservation of originals. That’s the only way to guarantee the preservation of information far beyond the point the original gets lost. As we speak, an European consortium of museums is undertaking a project (I believe called Europe 2.0) for backing up of their entire contents in digital format.

PaperBack is almost certainly an intentioned joke. Look at the pictures on the original article by Jeff. The last picture is a step backwards from the first. It’s irony at its best.

Is any of this getting through to you? What good are backups if they don’t last as long as the originals? We can spend thousands, perhaps millions of dollars to create a digital archive that is nothing but dust in 50 years, or needs another million dollar effort. Are magnetic disks really the best archival medium we can think of? Have we really thought the problem through?

Paper might sound like a joke to someone with digital myopia, but I’m not laughing.

Is any of this getting through to you? What good are backups if they don’t last as long as the originals?

I don’t think you understand the purpose of a digital backup, neither you are familiar with its standard procedures.

Sorry, but I need to be somewhere else.

PaperBack also sounds like the only encoding mechanism that can be severely compromised by fly poop, dust, fingerprints etc.

Most of the encoding techniques are hardware based and just feed the encoded information to software to use. A software based solution requires that the software continue to be maintained over the many many OS changes etc.

I’ll bet the original barcode scanners still work, you could probably get original punch-card readers on EBay - or even build your own. You can EASILY read 100 year old documents.

Getting software that is even 10 years old to work can often be a challenge.

You might want to calculate that into the value of a software based solution.

Oh, and a page full of static might be harder to recognize as valuable information later on.

Funny quote from PaperBack site…

“Actual version is for Windows only, but it’s free and open source, and there is nothing that prevents you from porting PaperBack to Linux or Mac, and the chances are good that it still will work under Windows XXXP or Trillenium Edition.”

XXXP, Trillenium Edition, nice.

I just scanned and restored a file that I saved to paper at my office this afternoon. It works like a charm, at least at 200 dpi.

One advantage of optical storage at this scale as opposed to a CD/DVD is that it obviously contains information. The dot pattern is a modulated gray that under only a little magnification reveals structure. It would be obvious to an archeologist that there was information to be recovered.

That said, storage of a Rosetta stone in the form of source code as plain text on paper (PaperBack is under GPL so its possible, and in C++ so its likely to be well enough known to have existed) would certainly be a good idea.

People are talking about archiving on DVD here but I’ve got several high quality DVDs from a couple years ago that already can’t be read due to problems. No way I’d trust DVDs for anything long term. To me they’re a crap storage system that we’ll be lucky to leave behind.

I like big butts and I cannot lie!

LOL, I was waiting for someone to notice this. Thank you Wade-O!

This is a bad idea. Have you ever printed a stack of paper and come back to
it 3 years later? All the ink has started sticking to the page above and when
you pull it apart it looks pretty bad. I can’t imagine doing this!

This problem was solved decades (if not centuries) ago. Air is what deteriorates paper, and if you prevent air from reaching it, the paper is preserved. Today we laminate, but for many people simple sheet protectors are good enough.

I think, right now, this is more practical for the “future civilization” use case for long-term archives of data. For it to be useful to a future civilization, we’d have to provide instructions for encoding/decoding and preserve them along with it, and assume this future civilization has certain technologies. The only things I can think might be useful for preserving for future generations involves text passages, and we already have standards for expressing ideas using text, so it may not be worth storing barcodes at all. Still, it seems like it’d be easy to put a few thousand sheets of paper and a few dozen pages describing how to decode them in an airtight, fire-resistant box. Compare this to storing a couple of DVDs in an environmentally-controlled container along with instructions for how to build a DVD drive. Of course, for short-term backups of things we back up frequently like payroll data, it seems ridiculous to rely on a technology that would have such a long lead time on restoration. DVD is probably not going to be unusable in the next decade or so, so if that’s your lifetime there’s no need to look for another solution.

The part on restoring, and what use is a backup if you cannot restore. They say to use just a regular scanner but based on experience with OCR the error rate for this would be rather high.