PDFs: We need to talk.
How to spot and counter the most common excuses for publishing PDFs on the web.
But first, a little history…
PDF gained popularity during the digital printing boom of the late 1990s and early 2000s. It made it easy to export and compress a high-quality QuarkXpress or InDesign file into a print-on-demand ready file that avoided the need for traditional offset printing.
This boom coincided with the dotcom bubble. Y2K bugs, overvalued internet-based companies, and file-trading websites are the first things most people recall from this era. But another less well-known pursuit was to have its own undesirable and arguably longer-lasting impact.
As companies and public sector organisations accepted that the web not only wasn’t a fad, but could make or save them serious money, they decided it was a good idea to publish every bit of content they had. And fast.
How did they achieve this? In many cases, by rounding up all digital copies of their booklets, factsheets, reports, and brochures, then getting their webmaster (remember them?) to publish them to their website.
If the website’s users were really lucky, these publications were ‘optimised’ for the web. That is, booklets had blank pages—a bi-product of print pagination—removed, multi-panel brochures were split and arranged into a logical sequence of pages, and images were compressed to reduce file sizes.
“We’re online—job done!”.
Except, to use an annoyingly popular saying of the time, NOT!!!
Problems with publishing PDFs on the web
Many people with disabilities need assistive technologies to access information. For example, many blind or vision-impaired people use braille to read printed text.
With computer accessed content, blind and visually-impaired users often need electronic assistive technologies. The most famous of these is the computer screen reader, which converts on-screen text into speech or refreshable braille.
Screen readers work best with semantically structured content:
“Semantic HTML is the use of HTML markup to reinforce the semantics, or meaning, of the information in webpages and web applications rather than merely to define its presentation or look. Semantic HTML is processed by traditional web browsers as well as by many other user agents.” — Wikipedia
PDFs? Well, you can and should make them more accessible. But it takes a lot more work to structure a PDF. Firstly, you need to structure the source content, such as the InDesign or Word document, and then tag the exported PDF using Adobe Acrobat Professional.
Without proper tagging, a screen reader will at best read a PDF as one unstructured stream of text. At worst, it will say something like ‘alert: empty document’. But even a thoroughly structured and tagged PDFs are not fully supported by some mobile screen readers (Adobe is reportedly working to address this).
All those publications published at the turn of the century? Through a lack of awareness, skills, time, money, or technology, almost none were created accessibly. Sadly, not much has changed since.
In order to meet their accessibility obligations, many organisations now create copies of their PDFs in structured Word documents. This can be helpful for many screen reader users who prefer this format, but it’s still an unfeasible effort for many organisations.
Mobile user experience
Just because PDFs are designed on screen doesn’t mean they are good for reading on screen.
According to Statista, in 2018, 52 per cent of all website traffic worldwide was generated through mobile phones. In Africa and Asia mobile traffic share is more than 60 per cent. Organisations are slowly accepting this reality and are re-designing their websites to be mobile responsive. This is great, except… for all of their PDF content.
PDFs cannot be made mobile responsive. Asking your already impatient users — especially those with only 3G coverage and a small data plan — to download an 8MB, two-column PDF that requires them to constantly pinch, pull, scroll up, and scroll down just to read the one chunk of content they actually need that’s buried on page 164, is like telling them to suck eggs.
For example, compare the UK Autumn Budget 2017 when viewed on mobile. In the following image, the left panel is an iPhone 8 screen shot of a section of content in responsive HTML. The right panel shows a PDF version of the same content. To make it readable, I had to pull zoom the PDF to make the text size as readable as the HTML version. Doing so hid at least a third of the content on the screen. To read each sentence, I needed to zoom in and out continually.
Most users won’t bother, and they’ll probably exit. Do you want to risk losing them?
Content design is producing content in the format that best meets a data-validated user need. That format could be anything from simple text, to video, audio, a diagram, a calculator, or progressive disclosure — also known as a content wizard, decision tree, or smart answers.
It is possible to do most of this in PDF, but like tagging, few people are aware of or bother to do it. Most PDFs end up containing static text and images.
Open data and APIs
Making PDF content chunkable and reusable by other applications via an API is possible, but as with content design, most digital teams wouldn’t bother when it is so much quicker and easier to achieve this with HTML.
Search engine optimisation
Accessibility expert Karl Groves points out that web accessibility doesn’t lead to better search engine optimisation (SEO). He’s right. There is very little direct impact. But what are the impacts of publishing content in PDF — accessible or otherwise?
If we look at Moz’s 2018 SEO checklist, we can identify several issues for content published in PDF:
- Have the most authoritative person create content that will serve the searcher’s goal. Remember all those mobile users? Ask yourself how a PDF “serves the searcher’s goal and solves their task” and does it “better than anyone else does it on page one”.
- Use rich snippets and schema markup to enhance content. This is HTML they’re talking about. Have fun trying this in a PDF. Note: Google Developers has a whole section on how you can enable your content to display more richly in search results.
- Optimise for page speed. Let’s revisit the UK Autumn Budget 2017. I ran a Pingdom full page speed test in Melbourne, Australia on the HTML and PDF versions. The HTML page loaded in 0.69 seconds. The PDF document loaded in 3.77 seconds. That’s more than five times slower. Google’s research showed that more than half of site visits are abandoned if pages take longer than three seconds to load.
- Who will help amplify this and why? If we accept the previous three points, how is a PDF going to earn you the links and shares that benefit your search rankings? And very few people know how or bother to add hyperlinks in their PDFs, so it’s not like they’re providing much reciprocal traffic for potential in-bound linkers.
Performance analysis, analytics, and insights
It’s not possible to place tracking code for popular analytics tools, such as Google Analytics and Hotjar, within PDFs.
Sure, with the help of a tag manager or custom code, Google Analytics can tell you how many times a PDF was downloaded from your website. After that, you’re on your own. Unlike HTML pages, in a PDF you can’t track:
- time on page
- exit rate
- links clicked
- scroll depth
- next page path
- heat maps
- in-page feedback.
Good luck working out if anyone actually read your PDF or found it useful.
Because PDFs tend to contain more visual branding, such as headers and cover pages, when printed they tend to use a lot more of your users’ hideously expensive ink or toner.
Conversely, modern website print style sheets include almost nothing but content. This makes them much more cost-effective for people who need to print your information — many of whom are those who can least afford the extra ink.
Spotting and countering excuses for PDFs
Once made aware of the problems with publishing PDFs to the web, most reasonable colleagues, stakeholders, and clients are willing to create HTML content before or instead of PDFs. But there will always be a few whom are harder to convince.
Here is a countdown of the top five most common excuses for publishing PDFs to the web that I’ve encountered. Most are simply uninformed assumptions, but it can be tough to change people’s minds. Here are some points to help you turn them around.
Excuse 5: “We need to upload it quickly”
The assumption here is that web pages take longer to create than uploading a PDF (or Word) document. This is true, but the extra time is only needed if the content was without collaboration with or prior knowledge of the web/digital/central content team. In which case, it’s likely web users’ needs were not discussed. Instead, someone just PDF’d a Word doc.
In this scenario, you might just have to publish the document to meet the stakeholder’s deadline, then start creating an HTML version to replace (or at least supplement) it. But don’t lose this opportunity to push for a more collaborative approach to content creation.
Your website governance and workflows should require that the web/digital/central content team is consulted early in the content creation process. Even better, use content workshops and pair writing to build a shared understanding of user and business needs, and ultimately speed up the process. I’ve seen this work well. Over time, most stakeholders get used to the process and appreciate the benefits.
Excuse 4: “We need to maintain our brand”
This one stems from the assumption that only a publication can truly encapsulate a brand through content. “If we remove our PDFs from our website, we’ll be left with nothing but ‘boring’ web pages.”
I counter this by pointing out that a brand is the sum of users/customers’ experiences with the organisation across all channels — not logos and decorative artefacts. A brand is best enhanced by providing content that allows users to complete a task quickly and easily. Rarely is this achieved through having to download a PDF.
And today, designers’ creativity is rarely constrained by web browsers: web developer Diana Smith recently created ‘paintings’ entirely from CSS and HTML!
Excuse 3: “But it’s a long document”
This assumes that we need to provide PDFs to download and print because users don’t scroll down web pages, or read long-form content on-screen. Neither of which is true.
There is plenty of evidence that people are happy to scroll, and touchscreens (yep, mobile) naturally enable easy scrolling.
Pew Research Centre found that long-form articles get twice the engage time of short-form articles, and about the same number of visitors.
As for ‘dryer’ content, such as reports, there are now a wealth of great examples of long reports created in HTML. The UK Government managed to turn the entire 86 pages of the aforementioned UK Autumn Budget 2017 into a single HTML page.
An even better example is the UK Driver and Vehicle Standards Agency (DVSA)’s 2017–22 strategy. This brilliant piece of content design incorporates everything from video, data that you can toggle between chart and table format, and key stats highlighted attractively throughout. All this in an accessible, mobile-responsive format that’s far more engaging than any PDF could ever be.
Oh, and anyone who does want to print the page, they can with confidence. Its host site, GOV.UK, has an excellent print style sheet.
Excuse 2: “People prefer PDFs”
“People prefer PDFs” is often code for “I prefer PDFs”. At this point, I could serve a “you are not the user” with a side dish of eye roll, but instead I dish up some data. Data doesn’t always trump opinion, but it can help.
In 2016–17, users of my employer’s website viewed pages in the renting section more than 1.7 million times. We’re required to provide much of this content in accessible PDF and Word documents, as well. These were downloaded 9,740 times. That’s 178 web page views for every document download. For our retirement villages section, which has a mostly older audience that is presumed to prefer PDFs, the ratio is a still salient 112 page views for every download.
So, take a look at your analytics and see just how popular or unpopular your PDFs really are. If you only have certain content available in PDF, create an HTML version and track which one gets more traffic.
Excuse 1: “We need to make it secure”
Stakeholders, having laboured over their content for potentially months or years, understandably place a high value on its ownership and integrity. PDF-ing the content is seen as a way of protecting it against misuse.
Also, most people who don’t know how to edit documents once they’re converted to PDF, assume no-one else can either. The same goes for unlocking password protected documents.
In this case, show them the results of a Google search for ‘edit a PDF’ or ‘unlock a PDF’. It’ll contain scores of websites with both paid and unpaid methods of achieving these aims.
The best way to maintain the content’s integrity is to publish it in HTML. This can’t stop anyone from making a copy and modifying it for dubious reasons, but the website will remain the source of truth to which any suspicious content can be compared.
Please do remind your content owners of their accessibility obligations. No format is safe from content misappropriation without being made inaccessible to assistive technology. There is no point in doing all the hard work to tag a PDF correctly only to inadvertently block screen reader users. They’ll be the only ones you do keep out.
Publishing PDFs on the web provides little benefit to your organisation or its users. They’re not mobile responsive, and they don’t lend themselves easily to content design, APIs, search engine optimisation, analytics, or cheap printing.
Most importantly, they’re harder to make accessible than is feasible for most organisations.
Unless you have a legal requirement or a user need — validated by user research — for publishing them to your website, PDFs are best left to the print world.