SEO strategy for PDF files
My site's URL is: www.hpisum.com
Description (including timeline of any changes made):
The content from PDF files on our site are contributing at all to our search engine traffic. Though they are indexed in the database, no one is able to find them. So, we are exploring options to make this content more available. We are currently developing a massive re-structure of the website.
Well over half of the content that we have on our website is in the form of PDF files. This is because our company has published a printed magazine for about 20 years, and so we have put some of those articles on the website. Many of those pdf's can be found here: http://www.hpisum.com/page.aspx?page_id=12
We know that it is possible to create PDF's in a SEO way (ex: using headings, titles, file names, ect.). However, looking back, none of those PDFs are SEO, and repairing them is just too daunting of a task.
So, at this point, we are thinking about providing SEO HTML versions, summaries, or excerpts of the PDF files. (This would also provide a service to our customers, so that they wouldn't have to go through the hassle of downloading it). That way, it would also be easier to link to and from the magazine articles, and also give more fluidity to our site, because the content could be viewed without people leaving it.
However, the biggest problem here is probably duplicate content. Will we get docked for providing both HTML and PDF versions of content as a service to surfers?
Should we use Robots.txt to block google from our PDF's and just use the HTML versions? It just sounds ironic that we would block Google from the very content that we are trying to make more available...
What is the best way to deal with getting content in PDF files to be found on Google?
I have done online research on SEO and PDFs, but I just haven't found anything definitive.
(No idea ahow much "SEO" you can do on a PDF though?)
And if you aren't going to do the SEO work on the PDFs, you might as well block them (or use the header meta responses and give G noindex headers) to avoid Dupe issues.
As the others mentioned, using HTML versions (or even the HTML summaries you mentioned) would be a great idea in my opinion, if most of your content is in the PDF files at the moment. While PDFs are certainly useful if you need to keep everything looking exactly like you have it in the magazine, web-users may prefer to be able to read the content before opting to download and view a PDF file.
If you opt to focus on your PDF files, I would recommend making sure that the content is available in a textual form within the PDF file and that it is in the proper character-sets (a simple test is to select some of the text and to try to copy it into a text-editor) -- from looking at your PDFs, it appears that your files are fine in that regard at the moment. By keeping the textual content accessible like that, you're making it easier for search engines to extract the relevant information from your PDF files.
You generally do not need to worry about duplicate content in a situation like this, even if you decide to mirror the content of your PDFs on HTML pages. If we recognize the URLs as containing duplicate content, we'll just show one of them to users when they search; your site generally wouldn't have any disadvantage by doing this.
Also, one thing which I noticed when looking at the URL that you posted is that the PDFs have somewhat strange file names. While this generally won't stop us from finding them, it may make it difficult for users to link to your content directly, which in turn, may make it hard for other people to discover your content. To help in that respect, I would suggest using a simple system for file names, perhaps based on the issue number or the date of publication, that also excludes "special" characters like spaces and punctuation marks.
Hope it helps!
We'll definately create a unique web page for each PDF article, so we can provide the content in HTML. We will also optimize the pages with sensible file names, title/description tags, ect. We would also provide an optional link for internet readers to download the article in PDF format, so they could see what it would look like when it was formatted correctly, as it was in the printed magazine. Another advantage is that this would make it easier for surfers to link to and bookmark these pages.
There are more than a few topics covering your question - Searching could possibly save you posting?
Failing that, if you have a question/issue, you should create your own topic and post all relevant info in that, rather than posting in someone elses topic.
Those points aside - a very big "Thank you" for at least looking for a related topic - very much appreciated :D
Would you recommend blocking PDFs in this case or you'll try to have visitors land on HTML pages, rather than PDFs, anyway, if I leave them unblocked?
If you have the same content in PDF as in HTML pages, in most cases we'll probably show the HTML versions above (or in place of) the PDF versions. If this is a problem for your specific situation, I'd consider using the robots.txt or x-robots-tag to prevent the PDF files from getting indexed. I imagine for most sites this is not really a problem, so I wouldn't suggest blocking indexing of PDF files without confirming that it's really necessary.
The only situation where I would consider doing something in advance is when the CMS automatically creates PDF-copies of normal HTML pages. Generally speaking, this shouldn't cause any problems, but those PDF versions are likely not compelling enough to merit getting indexed separately (and crawling them will possibly put a load on your server that you could avoid). Ultimately, it's up to you to determine which content you wish to have crawled and indexed :-) -- if you feel that PDF-copies of your content are compelling enough for users who search for your content, feel free to make them available.
Some community members might have badges that indicate their identity or level of participation in a community.
Member levels indicate a user's level of participation in a forum. The greater the participation, the higher the level. Everyone starts at level 1 and can rise to level 10. These activities can increase your level in a forum:
- Post an answer.
- Having your answer selected as the best answer.
- Having your post rated as helpful.
- Vote up a post.
- Correctly mark a topic or post as abuse.
Having a post marked and removed as abuse will slow a user's advance in levels.
View profile in forum?
To view this member's profile, you need to leave the current Help page.
Report abuse in forum?
This comment originated in the Google Product Forum. To report abuse, you need to leave the current Help page.
Reply in forum?
This comment originated in the Google Product Forum. To reply, you need to leave the current Help page.