The Trouble with PDFs

I agree with the reply about making a statement:-

I want my document content to be visible to non-MS-Word users, even though I may have created the document using the software.

Also, I download and keep copies of PDF documents (think IBM ‘Redbooks’) for reference and think they are really good for this. Using HTML directories, or even .mht archives, is not so good when I want to look through the document.

On that thought, many of the open source docs directories are awful to read through in comparison to a well-indexed PDF document.

Also agree that PDF should not be be used as a generic form of web content, only to be used where it is applicable…

While PDF is overkill for web design and many presentation types, it is absolutely essential when the presentation must be exact.

My system produces PDF insurance documents which have been reviewed by the legal department. The reason for the legal department’s review is simple: these documents are legally binding and as such must be exact. I can’t risk errant interpretation of CSS creating “unexpected consequences”, so we create the PDF documents that provide exact replicas of the paper documents the clients (and their clients) expect and in fact require.

So while I agree that PDF is overkill for many purposes, it has significant purposes left. (Another is pixel perfect typesetting for 3000 DPI imagesetters; a task I had with a prior client).

What he calls idiosyncrasies are actually advantages. HTML is a markup language, not a layout language. It’s intentionally different from PDF.

I can reflow an HTML page to fit the size of my iPhone’s screen. I can’t do that with a PDF, at least not easily.

Well, I read from A to Z on this page.

No one have to open PDF reader to view PDF file when linked in html code. Use opera and PDF will be opened in same window like any other link.

I am using PDF everyday 'cose I work in prepress. And, believe me, PDF rules when compared to .ps, .tiff, .jpg, etc. And from Acrobat Pro with some plug-ins I don’t need any other software to edit PDF file, change colors or any similar interventions on file.

In graphics design and prepress PDF rules and indeed is standard. As far as web is concerned, I don’t know, and don’t want to know.

Keith and all others that don’t understand PDF,

PDF is for PRINT MEDIA.

HTML is for SCREEN MEDIA.

Trying to apply a format to a domain it was never intended for will never work 100%. This is why whining about reading PDF’s on a computer screen because they don’t auto-flow like web pages is beyond retarded.

Do you understand yet?

I hate PDF. It always takes too long to load, and sometimes doesn’t even work.

Also, I find it funny how everyone is pro-HTML, but you’re not allowed to have HTML in your comment.

I like reading PDF document. And, it is very easy to understand PDF format, so we can make a simple PDF creator quickly.

Did you ask Kevin Kelly why he used PDF?

Let’s compare apples to apples, or HTML/CSS duo with PDF. PDFs have a huge advantage when it comes to producing a document that will look the same on screen and on paper. You can use any software to create the text/layout and then simply “print” to the PDF driver and you are done.

With HTML/CSS, you have to descend into the snake pit to fit a round peg into a square hole. This is so much of a distraction for me that I refuse to do it. I’d rather focus on producing content that fight CSS. I am using several freely available PDF drivers (Primo PDF and PDF995) both of which work like a charm.

Printing and line breaks are another matter. Even this blog does not print well. Have you tried it? It is a mess. Somehow, Ads and crapware on pages always prints fully on the page and the content I really want gets cut off.

You can also take a look at the crappy new layout of MSDN Magazine (msdnmag.com) What a friggin disaster. When you click on “print”, your printout won’t include snippets of code because there is, apparently, a bug in CSS stylesheet. They are working on it.

I completely agree with your assessment. PDF’s suck royally.
My biggest complaint is that they are hidden time bombs. I uninstalled the reader because I accidentally clicked on a 10mb pdf… after 10 minutes of loading (probably grossly exaggerated) i simply gave up and determined it wasnt worth it.
Its just like flash in my opinion. If its in flash, then its not worth the hassle to get.If i have to install a program to view it, forget it. Html can do it all at no ones expense.

I know this post is ancient, but I just stumbled across it, and there is one big advantage of PDF that nobody has really talked about.

Scalable images!

As an electrical engineer, I access technical information in the form of drawings every day. I’ve yet to see anything that works as well as PDF/Postscript for drawings.

Images in HTML pages are almost always bitmaps. They’re either small, fast, and totally useless for details, or huge, slow, and only barely adequate for detail. I know there are some scalable image formats out there, but PDF is still the easiest and most portable way to put vector images on line.

An example from my own website: http://jmkasunich.dyndns.org/pics/spindle-NMTB-30.pdf is only 33Kbytes, but can be zoomed and examined in extreme detail without getting pixelated.

The drawing above is really very simple compared to most engineering drawings - I only used it because all the better examples I have at my fingertips are my employer’s proprietary information. Most of the drawings I create and use are printed at 11x17, and would be difficult to read even on paper at 8.5x11. On screen? Forget it. You can get an overview, or you can zoom in, but it is impossible to do both at the same time. Paper is so much better. Maybe my opinion will change when we have 20 diagonal monitors with 300 dpi resolution, but I doubt it. I can flip through a 10 page schematic far faster than the on-screen version.

The above example is a case where the only thing in the pdf is an image, and I’m sure there are scalable image formats that would work (but are probalby not as widely supported as pdf). But in many cases, multiple scalable images are combined with text. Datasheets like http://www.analog.com/static/imported-files/data_sheets/AD7190.pdf are a good example. If Analog Devices started publishing that datasheet in HTML instead of PDF, I’d be pissed, and I wouldn’t be the only one.

As a person working A LOT with PDF, this really was a great discussion to read through… :slight_smile:

For those of you that don’t know that much about the roots of the PDF format and its connection to Postscript I recommend you to read through below pdf/postscript resources:
http://www.prepressure.com/postscript/basics/history/4
http://en.wikipedia.org/wiki/Portable_Document_Format#PostScript
http://www.inkguides.com/history-of-postscript.asp

I think pdf files are a standard that is readable regardless of the system or the source format of the text. Just put anything on a pdf like a Word document, web-document, or anything and it can be viewed like a good old paper document. But the same content can be generated as a html web-page too. The web page is part of the web site though, not a distinct document. I think not everything needs a distinct document, because lots of content can be read directly from a web page.

PDFs are invaluable for material where well defined page numbers are necessary for reference, citation, or discussion purposes. Academic and legal documents are two examples of where this is necessary. The ‘how to read’ document you linked above is most probably a case of this. HTML+CSS can’t yet produce documents with page numbers that remain the same regardless of how a document is viewed, be it on screen or on paper. Until HTML+CSS can do this, PDF has a valid role even for documents where all other formatting is within the realm of standard HTML+CSS.

We entertained the notion – very briefly – of using HTML as the layout mechanism for our financial reports. Unfortunately reliable page breaking, column alignment over spanning pages, footnotes, and simply not being able to use 100% of the paper consistently and reliably trashed that notion.

And since they’re financial reports, you really can’t fuck around with “implementation defined” rendering of layout. No sir. There can’t be alternate versions floating around because someone used Opera and columns 11-15 didn’t print, or Mozilla 1.5 and the minus signs are hidden behind a table edge, but IE’s rendering looked fine.

These reports are intended to be printed, but also optionally viewed online. PDF viewers are universally crap, and Adobe’s more so than anyone else’s except maybe Ghostview. But PDF is the right tool for some jobs.

Foxit is fast, but it is terrible at rendering text (I’m looking at a 21" Trinitron). Apart from the loading times I was quite fond of Adobe Readers 7 and 8. They kept changing the user interface for no reason though.

Preview.app is where it’s at for PDFs. Although I haven’t quite got used to the Leopard version yet compared to the Tiger one, where I actually quite liked the fact the drawer hung off the side of the window.

I think the problem is that most people just use PDFs because they can, not for good technical reasons. In fact my big gripe is that people often use more than one column of text to a page, which is very annoying to read on a computer screen because you scroll down to read the first column, scroll back to the top of the page, then scroll back down again to read the second column. If PDF readers could reflow text it would be fantastic.

GOOD PDF USES…

When one needs only to convey textual information, then yes HTML should probably be used. But there are many things that PDF can simply do that HTML can not.

Mathematics and symbol-heavy information. Sure MathML may one day help solve this, but it’s not pratical to use yet. Example paper by Einstein:
http://www.fourmilab.ch/etexts/einstein/specrel/specrel.pdf

Display of graphically-rich information. PDF is not just for controling layout and fonts, it is a full graphics engine ideal for those needing to convey very dense visual information where not everything should be in strict lines and columns of text.
Example for airport meterological forcasting:
http://reportlab.org/docs/provencio.pdf

Accurate reproduction of graphical information. This may also include things that need exact reproduction, color calibration, etc. so as not to distort informaition. Example from Edward Tufte
http://www.itee.adfa.edu.au/coursework/ZITE8140/tufte.pdf

Huge document sizes. Unlike HTML and other markup languages, PDF is not a serialized format. It is random-access. Thus it is ideal for handling very large documents of many thousands of pages in size. Properly written PDF software does not need to hold much of a PDF file in memory at any one time, nor does it need to read from the first byte and work toward the end. Example is the 9-11 commission report ~600 pages long (with proper browser integration you should be able to jump to and view any page before the whole document has even finished downloading):
http://www.9-11commission.gov/report/911Report.pdf

Legal publications. Anything that needs to be heavily cross referenced, including by page numbers, etc. or must be preserved in it’s pristine “published” format for legal or long-term artchival reasons is ideal in PDF. Given that most of the US Government publishes it’s official legal material in PDF is a good thing. An excerpt from the Federal Register:
http://edocket.access.gpo.gov/2008/pdf/07-6280.pdf

Vector artwork, especially for maps and cartography where precision are important. SVG can almost compete here, but it’s still not as ubiquitous as PDF. Example, maps of the US congressional districts for the Palm Beach Florida area:
http://nationalatlas.gov/printable/images/pdf/congdist/FL22_110.pdf

And perhaps many others uses uniquely suited to PDF.

That being said, in general if it can be done with HTML, then it probably should be. PDF itself isn’t bad. But common uses of it can be, and in particular, certain software implementing PDF (ahem, Acrobat) can also be bad. But don’t confuse that with PDF in general.

Is it just me or do you also think Adobe interfaces generally suck?

Commenting on a year-old post is almost always sure to yield no reply-gratification, but I’d like to relay an experience…

Here in Denmark, the government has laid out some goals for how the lesser governmental bodies might leverage technology, and most have failed miserably…

Case in point is my complaint to the municipality a few years ago, that the current email handling was hopelessly outdated, and that the insistence of publishing some vital documents in the Word (-somthing) format was in direct contradiction of the goals.

Give two weeks, when an email arrives in my inbox, signed by a Mayors aide, with no text but an attachment in an old word-format, that basically states that they try their best. Said word-file had no less than 4 macros that OpenOffice complained about (and they were poorly written at that as well).

I cannot accept that public officials send word processor documents instead of presentation documents, if they even find it necessary, but to ship a word document that relied on poorly written macros, in response to a complaint about not complying with open standards just blows my mind.

Clearly, for any change to be made, we have to acknowledge that the people working for the powers that be, are, in all likelihood, just ordinary people, who have no clue whatsoever with regards to anything computerbased…