The Trouble with PDFs

C’mon Jeff, PDFs are ok. Give up on Windows Vista’s Crap PDF support and move onto OS X, pdfs here are “just another file”. :slight_smile:

A PDF is better when you want to make sure that the other party receives the exact file. I haven’t had issues with PDFs for ages, but… I run OS X.

OK, this was just a “troll” :wink:

HTML Print s*** big time. Happy New Year!

I’m working on a web application and I’m using pdf to print some dynamically generated information. The development using pdf allowed me some very quick wins, but now I’m feeling the pain of trying to get more out of the pdf document and I’m wishing I had suck to HTML.

Having said that, the end user is more comfortable with a pdf document to save on their PC than an HTML file and on this basis alone, I use pdf.

I was far from being blown away with the structure of the pdf document and working with it programmatically. This comes down to the tools that are available and the fact it is yet another technology to get my brain around.

I’ve never liked PDF, mainly because Adobe are so greedy with it. For the longest time you couldn’t make them for free or edit them for free. The Acrobat Reader tool has always annoyed me, its slow, the interface takes a bit of getting used to and it lacks all kinds of things you normally get in text viewers/editors like a caret or the ability to view the document in a form other than pages… like continuous text.

Personally if its going on the web I am more likely to generate a PDF with anything awkward in it (maths symbols, funny diagrams etc…) then screenshot it, cut it out, save a .png/.jpg and then include that in some HTML document… I can appreciate the layout arguments if you intend the document for high-quality print… but thats not what the internet is for in general, its for providing information in easy to handle formats. What I like is a nice webpage that fills my screen on its own and doesn’t waste my space with excess toolbars and by rendering everything ontop of rendered paper…

When you view a PDF on screen, you’re using a computer to view a document that was really designed for viewing on paper.

When you try and print an HTML document, you’re using paper to view a document designed for viewing by computer. I have NEVER seen a non-trivial HTML document that’s designed to print well.

Unsurprisingly, computers do a better job of emulating paper than vice-versa…

Oh, and the worst thing about PDFs? Acrobat Reader…

And the best? Foxit!

PDFs have a lot of advantages. The most common is production. Say what you will about it but when I create a word file for others to read its either sent as a PDF or RTF. I could create the HTML equivalent but why got through the hassle of copy pasting, recreating stylesheets, converting embedded images to links, and on and on.

Your right that PDFs aren’t meant to replace websites. And they shouldn’t. But they excel at being archives for static information. One thing I wished the Wayback machine and Internet Archive did was archive websites as PDFs. usually, you’ll get text but the images and stylesheet are most likely busted.

PDFs and HTML serve different purposes. IANAL, but legally a PDF will stand up in court as a valid document, a web page that anyone can change in a moment will not.

I do agree that Adobe acrobat is junk. But I found better alternatives in Foxit and SumatraPDF.

I find the concept of page layout to be wasted if you are viewing it on a monitor, which is 99% of the time for me.

On a monitor, a PDF is completely suboptimal because you are effectively viewing a zoomed image rather than selectable wrapped text with screen-optimized fonts. You either find yourself looking at a page half-panned off the screen or “greek text” or both.

Even when printing, the PDF is often useless (no matter how carefully laid out a Foolscap page is, it will not print on A4 with satisfactory results.*

Sure, there are legitimate uses for PDF where it is useful, but web pages are not one of them. Although, I admit this is a bit like complaining that a skateboard is not a good cheesegrater: the real problem is that PDF is commonly misused in completely inappropriate contexts.

  • In Australia, Foolscap/Letter are rarely used and A4 is considered standard. However, most PDFs encountered tend to be on US websites.

Much of the impetus to use PDFs comes from the idea that I, as content creator, should have total control over how you, a mere reader, see my screen/page. We see the same idea in the use of CSS and other fancy tricks in HTML pages.

This is old-style thinking, which makes plenty of sense in world of print publishing. But it totally ignores those readers

  • using a different screen resolution than the creator
  • with vision problems
  • with text-based displays
    etc., etc.

Content SHOULD be able to adapt itself to different environments, and that’s what basic HTML does so well. The fact that browser output to plain old paper was and is so lousy is a problem of the browsers, not HTML.

PDFs don’t reflow text.
Usually I can’t comfortably read them on iPhone.
I can’t read them on OperaMini at all.
Sometimes even on 19" monitor to get readable text I have to zoom in so much, that it requires me to scroll horizontally.
With HTML+CSS I can break stiff layout to suit my needs. I don’t need your (designers) layout.

Really? Then why is this “packaged” as a PDF?

http://www.si.umich.edu/~pne/PDF/howtoread.pdf

How does this “packaging” help me, the reader?

  1. Embedded font
  2. Whole document including images is one file. 1 document = 1 file. A “document” that is a directory of html file(s) and images feels unwieldy.
  3. The text in the images is anti-aliased according to my preferences. If the diagrams in this document were inline html images, I would be at the mercy of whatever AA settings the author used when creating the image.
  4. Zooming retains layout.
  5. Annotation/comments (again all within the one file), side-by-side viewing, quick rotation of the page, other viewing-based features…

These are just things for this simple document you linked.

I work on software that can (among other printing targets) directly generate PDF documents. And I gotta say - I hate PDFs.

The key advantage PDF has over HTML is that it retains the ‘look and feel’ of a printed document. But at the same time, PDF files cannot be consistently printed accurately because the PDF specification itself does not allow the document author to control how the PDF is printed - only what it looks like when viewed on-screen.

In short, the ONLY thing that PDF documents are good for is book-emulation, which is better served by just using HTML.

I think the main problem with HTML/CSS is the lack of a proper archive/book format. How do you supply your manual/specs to customers who may want to read it offline?
An entire tree of files isn’t very neat, not to mention the fact the IE will pop-up scary looking security warnings about local files.
Microsoft provides “Web Archive” (*.mht) format to represent a page as a single file, but this is still only half a solution and is not supported in all browsers.

As for HTML not printing well:

CSS2 does actually offer some support for printed (aka “paged”) media.
@page, page-break-before, page-break-after, orphans, widows etc
See http://www.w3.org/TR/REC-CSS2/page.html

Unfortunately CSS2 only became a w3c recommendation in 1998. Browser suppliers have only had ten years, so support isn’t very good yet. Perhaps by 2018 we’ll be able to use it.

For now it would be good if web developers would at least provide a media=“print” stylesheet which makes some effort to turn off menus, adverts and other elements that are pointless on the page.

And no, providing a link to a “Print Friendly” version is NOT a better solution.

Yeah, this is pretty much the very old “page designers” vs. “web designers.” The web is the web, a book is a book. Just because we use metaphors to describe things in computers doesn’t mean that our applications would actually be better if they looked and word exactly like the metaphor.

I find web pages much, much easier to use than PDFs, because web pages are the web. I find PDFs handy to…print. Because they’re print media, really.

You can put a cat in the oven, but that don’t make it a biscuit!

-Max

…so html and css is so well supported in outlook?? … CRY

People tend to copypaste documents into outlook, because everybody is using outlook right? I hope that people will stop pirating office so we can get some real open standards. Because millions of people is using office without paying for it, but everybody is using it?

PDF is great, but I never understood the need for pdf reader intergrated in the browser. PDF files is and should be handled as documents, something you download to read or print.

HTML is a markup language handled by Web Browsers. HTML is good for simple documents and simple applications(talking layout + UI). HTML is great and with scripts and stylesheet it can do wonders. But since it is so badly handled, I think it is a good idea to keep the usage simple.

For me, PDFs are great for things which are really, actually going to be printed on paper, especially in standard formats like booklets, etc. But that’s about all they’re good for. PDF is also a good format to release something which was already a book, I’d say, as the thought will already have gone into the design and layout work.

I dread web content being delivered in PDF format, especially as (when I’m not on the Mac, which has great PDF support built in) it normally means using some form of terrible creepware/bloatware/crashware.

And yes, I’m looking sternly in Adobe’s direction here, because their bloody PDF reader keeps on trying to download products I don’t want whenever it updates itself in a vain attempt to stop itself from crashing my browser every five minutes, or clinging on to its process for dear life such that I normally have to terminate it from the Task Manager at the end of the day if I’ve so much as glanced at a PDF document on the web in the morning…

a strange, otherworldly out-of-browser experience.

Yes, a curious domain where your work looks exactly how you want it to look, and not as rendered by this version of that browser with those settings and those plugins and cookie options and whatever else. Who wants to go there?

i hate scrolling top-dowm feature through all the pages with PDF’s ,it totally takes away th feeling of reading a document.CHM’s do a better job than PDF’s.

I hate PDF format files. PDF is totally anti-web. And I avoid sites offering information in this format.

I hate pdfs for reports some department in my corp us pdf for showing reports so that there charts will look nice but this only leads to when you need to get the data back out. do to the formating not being ez to convert into anything else lots of hand entering data

Please write articles that don’t suck.