Preserving The Internet... and Everything Else

codinghorror · April 2, 2012, 12:00am

In Preserving Our Digital Pre-History I nominated Jason Scott to be our generation's digital historian in residence. It looks like a few people must have agreed with me, because in March 2011, he officially became an archivist at the Internet Archive.

This is a companion discussion topic for the original blog entry at: http://www.codinghorror.com/blog/2012/04/preserving-the-internet-and-everything-else.html

Victorvogelpoel · April 2, 2012, 12:00am

Whoa! Having good fun reading the old 1984 ZX-Spectrum Crash magazine from the Internet Archive! http://archive.org/search.php?query=collection%3Acrash-magazine&sort=-date

Woliveirajr · April 2, 2012, 12:00am

Archive.org is one of the best sites I know from the internet.

Nevertheless, it arises a very good discussion: that way internet never forgets… and there’s some discussion about the troubles we have when we do something and it can (and will) always be remembered.

Don’t you have the right to have your actions forgetten someday ?

K_Lawrence · April 2, 2012, 12:00am

Victorvogelpoel,

You’ve totally distracted me from this blog post now lol

PaulC · April 2, 2012, 12:00am

It’s a great resource, but it can be disappointing. So often it only collects say 1 or 2 pages from a site (repeatedly over time) and misses the rest (also repeatedly over time), even when the content is simple HTML and image links and should pose no obstacle (that is, not Flash, etc).

Huy_DINH · April 2, 2012, 12:00am

The format shifting and the change of reader implyies some loss in the culture.
We see it with the VHS to DVD (not all movie will be tranferred ), will see it with DVD to blu-ray, then with full digital distribution
The web gives the same issues, with the loss of rendering engine (a page rendered with netscape will never look the same with modern browsers), all those Internet Explorer quirks will fade, plugin content are near impossible to read (with the rise of webGL for example, VRML seems to have disappeared).
However I like to see my now defunct page back when I wanted to be a 3D graphics artist instead of a developer like I am today (not giving up the link, I was really bad at that time).

D1142 · April 2, 2012, 12:00am

On a somewhat related, but totally unrelated, note, where can we find details about their hardware specs to accomplish this impossible task?

Df_jones · April 2, 2012, 12:00am

What about archiving software? I’m not sure if anyone has taken this task up, making copies of either source code or binaries.

Sigivald · April 2, 2012, 12:00am

Interesting that they use retail packaged USB3 externals.

Is there some reasoning behind that?

Jason_Scott · April 2, 2012, 12:00am

The ASCII.TEXTFILES.COM weblog is currently down for the count due to a hardware failure. I appreciate the irony too. Machine will be back “later”.

The retail packaged USB3 externals are because the usual supplier of disk drives is subject to the same extortionate prices due to the Thailand floods affecting a lot of drive purchases, but bulk buys of the USB3 externals are, believe it or not, currently cheaper. That will change and I’m sure the Internet Archive will move back to the more intuitive drives when the price comes down.

MarkH · April 2, 2012, 12:00am

Never in my life have I seen that many individually wrapped drives. In fact I don’t think I’ve ever seen a hard drive wrapped, ever, in anything except a static bag.

Haven’t heard of OEM orders? I’m sure for quantities that large they’d oblige!

Jason_Scott · April 2, 2012, 12:00am

I restate again, Mr. Henderson: The current situation of drive costs due to the shortage related to Thailand floods is that the price of OEM drives has skyrocketed, often tripling or worse the price, as well as severely cutting back the ability to order any OEM drives at all. As a result of study, the Internet Archive found the external drives are currently cheaper than OEM drives, and are currently using piles of these drives for the need of the archive (an average of three drives a day have to be RMA’d). When the economic/supply issue is fixed, I’m sure the Archive will return to the method and approaches you are more familiar with.

Abel_Tesfay · April 2, 2012, 12:00am

That is just awesome… I am not sure if it is back in Georgia, but I have a 101 shareware games CD from the 90s somewhere. I might send that in if I can find it somewhere in my spindles.

It scares me to think how much data is “created” on the internet everyday. Is there a pipeline big enough to shovel all that to the internet archive, and how much could that possibly cost to handle the download of all the information? My mind is spinning just thinking about it.

MiguelF · April 3, 2012, 12:00am

I love the Internet Archive. Besides what Jeff said, it once saved my ass from a lawsuit for plagiarism: I was accused of copying an article from a magazine (one that I had written), but the Internet Archive helped me prove that said content was present in my website before the magazine even existed - the tables were turned and I got the upper hand in this. In the end, the lawsuit never materialized.

Dleppik · April 3, 2012, 12:00am

I have a hypothesis that the Internet Archive will make my personal web pages accessible hundreds of years from now. I’m testing this with letters I’m writing to my descendents. The one to my grandchildren is here: http://www.leppik.net/david/7gen/1_OtherGrandchildren.html and a little context is here: http://www.leppik.net/david/blog/?p=292

One of the issues is writing a program that will run correctly for the first time 50 to 200 years from now. That’s so that my letter is scrambled to discourage casual reading by unintended recipients. The intended recipients might not be technically savvy, so uncompiled C is out. My guess is that today’s ECMAScript will still run in 200 years. My reasoning is at: http://www.leppik.net/david/blog/?p=208

Dleppik · April 3, 2012, 12:00am

I should add, by way of context, that my children are still small enough for me to carry, so the letter to my grandchildren is to as-yet-completely-hypothetical grandchildren, so it shouldn’t get read for another 40 years.

I’m pretty sure today’s ECMAScript will still be readable in 40 years. After all, it took over 10 years for browser makers to fully implement CSS.

Johnm1 · April 3, 2012, 12:00am

The internet archive is an incredible resource. It’s a shame there isn’t a search facility because I’d love to search out all the Corewar / Programming Game material that has long since disappeared from the net.

Marylenelittlec · April 4, 2012, 12:00am

Wow, I always dreamed of going there myself one day!
I use Internet Archive daily, especially for genealogy purposes, so many websites were created about family trees ten years ago and most of them are “all gone” (from google and bing of course) but they existed, and the information they contain as well!
I have a question : when I was in library school, we had a full-week class on web archiving, digital archives and preservation of hardware (disk drives, diskettes, etc). We heard of a museum of “hardware” in the US, where they keep, restore and preserve old computers, OS, software and diskettes (aka that report your secretary mom typed using Word Perfect 3.1 back in the days) is it tied to the Internet Archive program? Went through my old school notes this morning and can’t find it anywhere… I think it was in Texas.

MatthewB · April 5, 2012, 12:00am

I like to think that this web page will be available for the next 10790283070806014188970 years:

http://stackoverflow.com/questions/1705008/simple-proof-that-guid-is-not-unique

MartinH · April 5, 2012, 12:00am

Wow, I really didn’t know that the Internet Archive is archiving so much more than ‘only’ (haha) the internet. There is some really cool stuff out there.

Thanks for pointing that out!