The Trouble with PDFs

ShawnW · January 3, 2008, 12:00am

PDF for me is when I need one of two things: It needs to print the way it looks; or I need to make sure that no one changes it (e.g. Contracts). Otherwise PDF isn’t useful at all…

Though the form filling API is wonderful if you need standardized forms (e.g. IRS.gov).

hmm3 · January 3, 2008, 12:00am

I used to see PDFs as the speed bump of the Web. In many ways they’re like an unexpected cul-de-sac though too.

I destest them. In most cases even printed you have a pile of inaccessible information, though that seems to be more the product of poor organization combined with the nature of paper documents.

Generally I’d rather they just go away, for many of the reasons already given by others.

gogole · January 3, 2008, 12:00am

@Graham

I was being a bit ignorant about the page flipping thing. thanks for the correction though , coming to think of it PDF’s being immutable makes the format reliable in archiving documents.

Justin · January 3, 2008, 12:00am

PDF is a document file format. It definitely wasn’t made to be an interactive web presentation format. In the same way, HTML is not nearly sophisticated enough to do professional publishing; and it shouldn’t because it’s not a publishing format. In fact, HTML is a super-simplified subset of SGML (a full fledged document markup language). Comparing PDF and HTML is like comparing apples and oranges (or MIDI and WAV files). Before I was a programmer I worked for Xerox, scanning in thousands of manuals and documents for different companies. The PDF format is perfect for that. When they’d ask for 50 copies of manual xyz, I would pull it out of the archives and have it start printing in seconds. The PDFs can embed all kinds of meta data/printer specific info in them that makes it possible to do production printing from PDFs. I can agree that most web content should NOT be PDF, but I don’t think HTML is a good substitute for a good document format.

ProfessorT · January 3, 2008, 12:00am

Jeff,

I agree with you entirely if you frame your argument to say that if we’re just giving consumers information, HTML/CSS is a better choice because even if there are minor formatting issues, they still get all of the information. But you forget the whole point of PDFs: to have a platform independent method of guaranteeing that your document will render the SAME EXACT WAY on all computers.

This was a HUGE deal not too many years ago with all of thse *nixes and Macs and PCs running around. In a way, this still is a big issue. You have to remember that PDF is a publisher’s tool; a layout editor’s tool. The point of PDF is to preserve the presentation. Does the general populace need an exact representation of you printed brochure for your tourist attraction? No, they shouldn’t need it when they’re on your website unless you have incompetent web developers. But if the brochure that you just laid out needs to go to your boss for approval for print to put in your building’s lobby, then hell yeah PDF makes sense.

Also, consider this: with the advent of digital signatures, PDF gives businesses ways a secure way to look at proposals, add comments, sign when approved and note what needs to be noted without giving everyone in the process the means to edit the document to their liking. Can this be accomplished with a website? I dare say no.

This is the old Mac vs. PC argument: at the end of the day both OSes are just tools. The tasks that you most often perform will dictate which platform you run. Personal preference comes into play at the intersection of common activities such as surfing the web. But editing video on any kind of a professional basis? You’re killing me to say that you prefer Windows Movie Maker to Final Cut Pro.

Rick_Cabral · January 3, 2008, 12:00am

Being in the content management business, PDFs frequently violate the DRY principle from an information architecture standpoint. They are maintained separately. Cutting edge firms are beginning to utilize document management/content management systems that are capable of outputting multiple formats of the same data, but we’re YEARS away from it being standard practice.

Because PDFs have a high level of creation overhead, they are born outside of the normal content flow of an organization. Even though the executives and designers love them, they are the neglected stepchild of the communications department, who generally prefers web and email communications.

Personally, I see a PDF and I think: This is not the latest information, it’s not going to be accurate, and it’s not going to be detailed enough to provide me the information I want.

Rick_Cabral · January 3, 2008, 12:00am

I also find that organizations that emphasize PDFs as a communications medium are sensitive to the high cost of print design, and are attempting to squeeze more value out of a print project.

MiRAGe · January 3, 2008, 12:00am

Jeff, I do not agree at all on your piece of writing, but I have trouble explaining why…

First of all, PDF was never meant (and not commonly used) as a replacement for HTML. If you do so, I’d say you are using it for all the wrong reasons. Second of all, Adobe provides customers (or should I say consumers) with a lot more then just Acrobat to show their content on the web! (I like to refer to FLEX, since it is so closely related to PDF and Adobe itself, and because it is a ‘new’ and hip technology).

Also, like most of Adobe’s products, Acrobat is cross platform (as far as it goes) which is good thing. Considering that HTML renders differently on every browser on every system on every machine in every country in every language on every resolution…

My point is; PDF for the web is extra. It was never meant to be. It’s a publishing thing. Try shipping a HTML page to your publisher!

rustyvz · January 3, 2008, 12:00am

Also, PDF allows content creators control over their content, while allowing the content to be passed around.

Take for instance this page. For someone else to read it, I either have to give them the URL, print it out, or save as something like a MHT.

But how do they KNOW that the content is the original? Could I have saved the file locally, edited to change the point of view expressed, and then provided the copy to them?

PDF allows restrictions that you cannot get with HTML or PDF. You can:

lock a file with a password
prevent content from being copied
prevent printing the document
… and so on …

Nasty eBook publishers know these tricks. So do ‘whitepaper’ authors who lock their whitepapers to prevent modification, to prevent corrupting their statements.

Oh, and MHT(built into IE and available as a AddOn in FF) will save the CURRENT page page only. A PDF is more like a Word document, as it can have many pages, a table of contents, and a vast array of other features that are not dependent on the capabilities (or quirks) of browsers.

That being said, programming code to do automatic layout in PDF format, where position can be expressed in hundredths of a mm, is a pain in the… you know…

Chris_Chubb · January 3, 2008, 12:00am

I think that the PDF format would be a lot more useful if Adobe would strip out the 99% of the engine that you don’t need when loading it, or at least make it a load-on-demand operation. Waiting 15 seconds to load the viewer each time just plain sucks.

And have you tried to maintain many PDF files for common changes? If PDFs had an “Iframe” or “include” type construct you could really make them more useful.

JeremyA · January 3, 2008, 12:00am

I tend to revile Adobe entirely because of Acrobat, it is entirely too damn slow for the web, as it is today, the promotional offers that Adobe pummels us with and the clumsy, giant software updates irk me to no end. I only download, and then review .pdf outside of my browser, otherwise I can’t switch tabs in Firefox without waiting for it to load, by which time I have entirely lost my patience, and then who knows, I might starting killing processes almost at random. I’m too wise, I know it is always the Acrobat lagging my browser, so I do my best to avoid them in my personal surfing.

I see your point, and I’m sure Adobe wishes we would all use .pdf instead of HTML, but I think fillable secure forms, white papers, and scanned documents are served well by .pdf format, but we need some optional software with which to view and interact with, as well as edit them. I know there are open source alternatives, but I have yet to try them. Interesting read, thanks all.

TheL · January 3, 2008, 12:00am

Chris Chubb: for fast loading times use Foxit PDF Reader

Dave · January 3, 2008, 12:00am

Having read through a pile of further comments I’ll emphasize my previous comment even more: The majority of perceived PDF suckage is due to its awful implementation under Windows. If it were handled the way Apple does it (and again, I’m saying this as a Windows user and not an OS X fanboy), the perception of PDFs would be far less negative.

mjmcinto · January 3, 2008, 12:00am

I think you’re missing one key point for PDFs. They’re savable. If I create something, I can make it into a PDF, and save it offline. I can easily see two cases where this would be better than an HTML version.

That will allow me to send it to other people, that may not have access to it otherwise (say it is located on a company intranet, and while the person I’m sending it to has a business reason to see that information, they don’t have access to the intranet).
A document that is to be used as reference. I may want to save that document so I can use it when I’m not connected to the internet (while I know you can view HTML while not connected, I’m assuming that an HTML version would be displayed on a website, and thus no connection would mean no ability to view). It also means the user can maintain a copy, and not have to worry about the site removing it and they no longer have access to it.

Just my $.02

CraigB · January 3, 2008, 12:00am

While I agree with Jeff on the inconvenience, and often overkill, of the PDF format, I do agree with Stephen Schmidt on PDF’s packaging purposes. Just try saving most any web page from the internet and send it to someone via email (and I don’t mean a link to the content). PDF is very good for document management this way. However, I think that Internet Explorer’s “Web Archive” format, as mentioned by Graham Stewart, is better for most content currently packaged as PDF. Currently stuck in “Proposed Standard” status since 1999, RFC 2557 (a.k.a. MIME HTML), seems to be the missing functionality that should be standardized in all modern web browsers.

http://tools.ietf.org/html/rfc2557
http://en.wikipedia.org/wiki/MHTML

Andy_C · January 3, 2008, 12:00am

I’m with Jeff on this one. PDF has many virtues, but friendliness to the web is not one of them. Too damn slow, even with Foxit Reader.

Not_Quite · January 3, 2008, 12:00am

I think the biggest issue here is to remember that PDFs generally occur as a convient medium for multiple paths of distribution. Sure, someone could send a link to a website, but in most cases it’s more effective having an actual file. Specifically in a bussiness eviroment. I don’t think people intend to USE pdf as a replacement for HTML, I think it just comes across that way. Plus, it’s much easier to convert text from another “medium” and just throw it up on the web as opposed to having to create an HTML document from it. Not to mention I can scan and print to PDF, it’s quick and dirty but efficeint.

Stephen · January 3, 2008, 12:00am

The old Acrobat from ~2000/2001 wasn’t bad, but they reached a point where adding every possible functionality was more important than small, speedy, and not annoying. I don’t want to use PDF anymore and I avoid the pdf links like they have plague.

Printing a web page from a browser is one of the worst experiences I’ve gone through a web developer. The features that make them good anywhere, anycontent browsers contradict the precision needed for form printing.

Rob_Funk · January 3, 2008, 12:00am

First of all, I agree with what many people have said – use the right tool for the job, PDF is ideal for documents to be printed, PDF unifies a document in a way that HTML can’t, Windows PDF integration sucks compared to Mac and Linux (specifically KDE and probably Gnome), Adobe’s PDF reader is increasingly big and annoying, etc.

But many have mentioned “immutability” of PDF, and there I must partially disagree. While Adobe certainly promotes that idea, it’s not entirely true. A PDF can be loaded into a PDF editor, or converted into something that can be loaded into a vector graphics editor. A password-protected document can still be copied as a file, as well as screenshotted (is that a verb?). A PDF reader doesn’t have to obey a “don’t print this” directive. And as I recall it’s even become possible to create a PDF with arbitrary content that still passes an MD5 digital signature validation.

Certainly it’s hard to violate the protections (to varying degrees depending on the specific protection you’re depending on), but to my knowledge the only PDF protection that can’t be broken (yet) is the password protection to read it. Most of the protections people rely on in PDF won’t stop someone determined to get through them.

BrendanM · January 3, 2008, 12:00am

Page numbers. Try reading the XML specs when printed. Without hyperlinks, page numbers are essential.

This is easy to solve. Two ways:

force page-breaks; OR
hyperlink serialization: hyperlinks, when printed, are annotated with the page number of the target, if it’s in the same document. We could add a switch, to enable/disable this behaviour, so it happens for the page of contents, but not for every term.