Fundamentals of PDF Optimisation for Search

The words PDF and optimisation rarely sit side by side before the eyes of a webmaster where their news feeds are, more often than not, churning out articles established around other practices in search engine optimisation like building links and engaging in social media.

However, the truth is that PDF optimisation should not be overlooked as a factor in SEO. A lot of webmasters do this, so take advantage of what you’re competitors aren’t doing by optimising all of your websites portable documents. Even if you don’t have any yet, this overview on PDF optimisation will guide you through how to create documents that both your users and the search engines will consider valuable.

First Things First

Adobe explains a PDF as being the global standard for trusted electronic documents and forms. Many businesses use PDFs between employees, clients and customers for a variety of reasons. They may include sales sheets, technical briefs, methodologies or manuals but there is really no limitation to what information can be displayed in a PDF.

Obviously, some of these documents are considered sensitive and should remain confidential to the business, however where you find it would be appropriate to share chosen information with others, it should be put online and it should be optimised to rank in the SERPs.

In response to which are the most popular non-HTML files that Google indexes, the search engine states;

PDF formatted files are the most popular after HTML files. PostScript and Microsoft Word files are also fairly common.”

Sacrifice Images Over Text

Making your PDF look all nice and pretty is great for people reading the document, but is it that exciting from Google’s eyes? Many people make the mistake of creating and exporting their PDFs from a graphics program and wonder why they can’t see it ranking where they want it to. It’s critical to create and keep your PDFs strictly text-based, so that search engines can establish what the content is about.

“Exporting from a text-based program should allow you to highlight the text from a reader”

Google has the ability to convert PDF files to HTML and the “View as HTML” link that appears next to search results allows searchers to view the document in an over-simplistic, dull and limited manner. However, as of October last year, Google rolled out the “Quick View” link which is based on the same technology available in Google Docs and Gmail, which displays the PDF with formatting intact. Note: This does NOT mean Google can now read and crawl image based PDFs, it can just display them.

“Google’s Quick View functionality”

Optimising PDF Properties

Properties to a PDF are what meta tags are to a web page. Populating fields in the properties menu option will provide search engines with a much better understanding of what the document is about. The Title property is a priority to complete, as this will be displayed as the title of your search result. Failure to do so will have Google looking at the content of your document and automatically generating a title you may be less than impressed with. Not only do these look a little off, but chances are searchers won’t be drawn to click on them.

Although there’s not 100% control over meta descriptions, Acrobat features a handy little tool called TouchUp that allows you to re-arrange the ‘reading order’ of your documents pages. What this does is informs a search engine of the first ‘important’ page to pull text from for populating the meta description, as opposed to the very first thing it reads like an index or contents page.

For this reason you should make sure the first few sentences of page one in your documents reading order is written just like a meta description, including keywords and an enticing call to action.

An example of poorly optimised meta descriptions from PDF files:

Although other properties such as Author, Subject and Keywords don’t have as much of a direct affect on search results, you still shouldn’t skip this. It can’t hurt to quickly fill these out, and these may come into play in the future for ranking PDFs.

“The Document Properties dialog box”

Optimising the Documents Text

Just as you would read a PDF the way you read a web page, you should optimise a PDF the way you optimise a web page too. There’s no difference here. Include keyword rich copy, latent semantic text and links with optimised anchor text to the most relevant pages of your site. Have a look at this previous BCA post if you’re not so familiar with how you should be copywriting for SEO.

Links within PDF documents are sometimes forgotten about, as people hold a general notion that only web pages hold the capacity to bear links. Not true. PDFs shouldn’t be considered as just a static print document and therefore links should be distributed evenly throughout the document where possible.

Not only is this important from a search perspective, but often PDFs are forwarded on or recommended to others as an informational resource, so having links to your site embedded in the document can quickly lead to new business.

Don’t get silly with this though; stuffing the document with links to everywhere on your site will only leave Google with the idea that you’re spamming. Again, treat the copy of the document the same as you would a web page; provided that you actually aren’t a spammer!

Considering Acrobat Version Releases

Although the technology that drives Google is somewhat stupefying, the search engine may still lag a little when it comes to keeping up with Adobe’s advances. New versions of Acrobat are consistently being developed and released, so it’s always a good idea to take a step back when publishing PDFs to ensure that Google can read it.

Export the documents to a version that is supported by at least 1 release lower than what is currently available (i.e. at current, Adobe Reader X is available, however it was only released November 16th 2010, so I’d consider publishing a document with Adobe Reader 9 at PDF version 1.7).

Avoid Frustrating Google

Another thing to think about when exporting your document is the file size. Hosting a PDF document that’s 600 odd pages long, very text-heavy and includes images is only going to end in search engines abandoning your file to save on crawl bandwidth. A file of around 200K is usually a good figure. Good practice when exporting is to select “Optimise for Fast Web View” to allow the PDF to be loaded one page at a time, instead of the entire document at once.

PDF Positioning

If you want certain web pages to be indexed you allow the crawler quick and easy access to them by linking from high-level pages on your site, right? The same applies for a PDF. If you want the document to be indexed and actually show up in search results, consider it’s placement in your website’s directory structure and link to it from pages close to the site’s root level.

Tagging Through Accessibility

The full version of Acrobat also supports tagging for PDFs. This is just like implementing tags in HTML. By selecting Advanced >Accessibility > Add Tags you can specify things like heading tags and even image alt tags within your document. Specify these in the same way that you would with a web page; keep the headings tagged in a hierarchical manner and optimise the images with keywords.

The Significance of PDFs

Although PDFs aren’t nearly as powerful as web pages in search, the fact that you have the ability to optimise these means that you should. In some cases, PDFs can be an excellent source for external links and referred business, so the significance of them should never be undermined in your SEO efforts.

See aegan's author page for links to connect on social media.

Comments (6)
Filed under: SEO
Still on the hunt for actionable tips and insights? Each of these recent SEO posts is better than the last!

6 Replies to “Fundamentals of PDF Optimisation for Search”

I have seen PDF SEO is not been given much importance, you wont find much info on how to optimize pdf, so i want to thank for providing this info.

No problems, I’m glad you found it helpful.

Hi Sebastien,

Usually if you follow a link to a PDF file hosted on any given website the file will open directly in your browser.

In this case, the document’s links behave just like a web pages would and the robot of a search engine has the ability to crawl and follow the links within the document.

It is only when you download the PDF to a reader that you’ll experience the security warning prompt when clicking a link, as the program itself is identifying a threat to the user potentially accessing external content.

Optimising the anchor text of links within PDF files is always good practice. Not only will a PDF increase your site’s internal links, but if you allow others to host your PDF on their website for free, you’re building multiple inbound links too.

Make sure that the links within your PDF point to a variation of your site’s key landing pages, and not just your home page.

I hope this helps.

Aaron.

I think this is the first article related to PDF Optimization.. thanks for sharing..

Dear Aaron,
Very interesting insights you share here.
I have to say that I never thought about specifically optimizing the pdf (nor any other readable documents on that matter) more than the common “fill up consistently every slots available”.
By reading your article, I’ll spend a little more time on them.
Hence I have some few questions to make sure I understand correctly.
With your experience, how do the search engines treat in the links inside the pdf?
Considering that most pdf readers ask you if you want to follow a link when clicking on it, Would those links be followed by SE?
Do you think a consistent inside pdf anchor text linking can contribute positively to your (onsite/offsite) SEO efforts?
Thanks again for sharing your knowledge.
Cheers,
Sebastien

Nice article! This is one of first articles I read about PDF optimization. I certainly will apply the above suggestions on my pdf files. Good job.

LEAVE A REPLY

Your email address will not be published. Required fields are marked *

Serving North America based in the Los Angeles Metropolitan Area
Bruce Clay, Inc. | PO Box 1338 | Moorpark CA, 93020
Voice: 1-805-517-1900 | Toll Free: 1-866-517-1900 | Fax: 1-805-517-1919