Organic Track: Bot Herding
Back from lunch and Rand Fishkin is moderating a star studded panel. We have Adam Audette (AudettaMedia), Hamlet Batista (Nemedia S.A.), Nathan Buggia (Live Search Webmaster Center), Priyank Garg (Yahoo Search), Michael Gray (Atlas Web Service), Evan Roseman (Google) and Stephan Spencer (Netconcepts).
[Ooo, we’re rocking the Tears for Fears. It’s pretty sweet. SMX always pulls out the good stuff. I’m really loving the Organic room this time around. The first row is literally two feet from the stage. I could totally throw my mini bottle of water and hit Adam Audette square in the face. Not that I would. I really like Adam. Maybe I’ll take out Rand instead?
Holy Jesus. I almost just kicked over the big projector that the speakers watch. Blogger Fail.]
Rand gets this whole show going. He says he’s moderating. He’s wearing a tie. I don’t think I’ve ever seen Rand wear a tie. Up first is Michael Gray.
Michael says that when you first buy a house you’re poor. Over time you make more money and can afford to air condition your house. But no matter how much money you have you’ll never air condition your mail box because you have a better use for your money.
Similarly, you have a Web site. You don’t have a lot of links. You don’t have a lot of Page Rank. You’re not going to send that PR to your Contact page because it’s not a good use of your resources. You want to send your PR to the pages that make the most sense for you. You want to send it to the places that will give you the most sales and the most leads.
PageRank and link equity, how much do you have: Many Web sites, especially smaller or new Web sites, don’t have a lot of PR. They have to use and maximize what little they have and direct it to the right places that make the most difference.
Deciding What to Sculpt Out
Who wants to rank for privacy policy, terms of use or contact us?
Locations: Unless you are multi-location business, put your address in the footer and sculpt out the Locations page.
Company Bios: Unless you are involved in reputation management scandal, sculpt them out.
Site-wide footer links, advertising stats, rates and legal pages.
How to Sculpt:
- Nofollow: Quick and easy, but may be a signal to search engines that an SEO or advanced webmaster is involved.
- JavaScript: Old school, relies on client side technology, currently bots don’t crawl it but this may change in the future.
- Form pages, jump pages, redirect pages – More complex to implement and maintain. Search engines currently don’t follow them but that may change
Be Consistent. If you’re going to nofollow something, do it with all of your links and then do it in your robots.txt. Don’t block them one way and then allow them in another. This will account for outside links and any spider or search engine quirks. He says that he’s seen most benefits on mid-level sites where they sculpted out blocks of 20-50 non-conversion based pages.
Do it now or wait for a rainy day? He says do it now. If you have any critical or serious issues this can take a backseat. Otherwise, unless you have a large or very complex site, PageRank sculpting is a 1-2 day project at the most for any CMS or template based site. It’s easier to get it right now than to get back and fix after you launch.
Adam Audette is up to give us 8 arguments against sculpting PageRank with nofollow. He used to do it but he’s now starting to slow it down. They use it far, far less.
More Control: Have a mechanism at the link level to control spider behavior is good. However, we don’t know enough. We don’t know how much PR we have on a domain. We don’t know how much we have on a page or how much a link takes off a page. We’re attempting to control the flow of internal PR but we don’t know how much we have. We don’t know how much it fluctuates. It’s imprecise. It’s like using a precise surgical tool while blindfolded.
It’s a Distraction: There are a lot of things we can do to make our sites better. Matt Cutts has said that sculpting with nofollow is a second order effect. It can also mask other issues – focus of a page, keyword dilution, user experience, etc.
Management Headaches: When you have a large site you may have many departments working on a page. What rules are in place? It’s confusing. Why are 5 links nofollow’d on this page? How do you preserve it?
It’s a Band-Aid: People are using it to try and address a symptom they’re seeing on a site. They’re not taking care of the problem.
Where’s the User?: Think of a site with tons of PageRank feeding that into mediocre places, thereby raising those pages in the SERPs. Are we giving more power to high authority domains?
Open to Abuse: Every tool is open to abuse, but you can think of all kinds of creative ways to use nofollow. When will nofollow start being abused and how will the search engines react? Matt Cutts says it’s okay, but there are good and bad ways to use a technique. He may look at your site and think you’re using it in a bad way.
Too Focused on Search Engines: Advanced search engine optimization has always been about what’s right for your users and what’s right for search engines. Too much use of the nofollow puts too much focus on search (specifically on Google). Does this help your users? Would you do this if the search engines didn’t exist?
There’s No Standard: There are multiple definitions for nofollow and each engine may treat it differently. Nofollow started for blog comment spam. Then it went to paid links. Now it’s to control your internal PR. What’s it going to be next? It moves too much.
[Rand shares a detail but says you can’t ask him how he knows it. He says 5 percent of pages on the Web currently have a nofollow’d link on them and 85+ percent are using it internally.]
Stephan Spencer is up.
Duplicate content is rampant on blogs. Herd bots to permalink URLs and lead in everywhere else. (Archives by date, category pages, tag pages, home page, etc). You can use optional excerpts to mitigate that a bit. [Whatever that means.] It requires you to revise your Main Index Template theme file.
Stephan says to include a signature link at bottom of your post/article. Link to original article/post permalink.
On ecommerce sites, duplicate content is rampant because of manufacturer-provided product descriptions, inconsistent order of query string parameters, guided navigation, pagination within categories, tracking parameters, etc. Selectively append tracking codes for humans with white hat cloaking or use JavaScript to append the codes.
Pagination not only creates many pages that share the same keyword theme, but it also creates very large categories with product listings not getting crawled. Thus lowered product page indexation. Do you herd bots through keyword-rich subcategory links or View All links or both? How to display numbered links? You have to test because your mileage will vary.
PageRank Leakage: If you’re using Robots.txt Disallow, you’re probably leaking PageRank. Robots.txt Disallow and Meta Roberts Noindex both accumulate and pass PageRank.
Stephan talks about the magic of regular expressions/pattern matching and I’m not even going to pretend that I followed any of it.
Some expressions I did manage to catch:
Mod_rewrite specifics
Proxy page using P flat
QSA flag is for when you don’t want query string parameters dropped.
L flag saves you on server processing
Got a huge pile of rewrites? Use rewritemap
He talks about conditional redirects. It’s way black hat. I’m covering my eyes.
Error Pages: Drop them out of the index by returning a 200 status code instead so that the spiders follow the links. Then include a Meta robots no-index so the error page itself doesn’t get indexed. Or do a 301 redirect to something valuable and dynamically include a small error notice.
Hamlet Batista is up to talk about white hat cloaking.
Good vs. Bad cloaking is all about your intention. Always weigh the risks versus the rewards of cloaking. Ask permission – or just don’t call it cloaking. Don’t call it cloaking. Call it IP Delivery.
When is it practical to cloak?
The main idea of cloaking is about making more of your content accessible to the search engines. Parts of that can be because you’re using a search unfriendly CMS, if you have content behind forms or if you’re a rich media site. It can be that you’re a membership site (free vs. paid). He’s also going to talk about using it for site structure improvements, geolocation/IP delivery, and multivariate testing.
Practical Scenario 1: Proprietary Web site management systems that are not search-engine friendly.
Regular users see URLs with many dynamic parameters, but the search engines see friendly URLs. Your users will see URLs with session IDs, but with simple cloaking the search engines see URLs without session IDs. Your users will see URLs with canonicalization issues. The search engines see URLs with consistent naming convention. Your users see missing Titles and Meta Descriptions. The search engines see automatically generated tiles and Meta Descriptions.
Practical Scenario 2: Sites built in Flash, Silverlight or any other rich media technology.
With cloaking, you can give users a completely Flash site and the search engines will see a text representation of the graphical, motion and audio elements.
Practical Scenario 3: Membership sites.
Search users see a snippet of premium content on the SERPs and when they land on the site they are faced with a reg form. Members see the same content the search engine spiders see.
Practical Scenario 4: Sites requiring massive site structure changes to improve index penetration.
Regular users follow the structure designed for ease of navigation. Search engine robots follow a link structure designed for ease of crawling and deeper index penetration of the most important content.
Practical Scenario 5: Geotargeting
Practical Scenario 6: Split testing organic search landing pages.
How do we cloak? In order to cloak you have to ID the robot and then deliver the content. You can do that via a few methods:
- Robot detection by HTTP cookie test.
- Robot detection by IP address
- Robot detection by double DNS check
- Robot detection by visitor behavior
Hamlet runs out of time and Rand nearly yanks him off the stage. Poor Hamlet. He didn’t get to finish his presentation, but Rand was just doing this job.
Priyank Garg is up.
Robot Exclusion Protocol: Allows publishers to tell Robots access permissions for their content.
Robots.txt: Introduced in ’90s. Defacto standard followed by all major search engines. Allow site level directives for access to content.
META Tags: Page level tags. Allow finer controls.
What is the standard? Does everyone work the same? Priyank says the engines are working together to make standards across all engines. The engines all support page level tags like HTML Meta, noindex, nofollow, nosnipper, no archive, noopd.
They want to have all the engines come out with this at the same time so there is no confusion.
In the Q&A, Evan Roseman says they don’t view uses of nofollow as some type of “flag” for SEOs. They’re standard of nofollow has not changed over the years. People have simply begun using it in new ways.
Nathan Buggia says a nofollow’d link is viewed as any other link. MSN Live does not support nofollow. [Update: Nathan retracts his statement later on, much to the disappointment of bloggers everywhere.]
One Reply to “Organic Track: Bot Herding”
Thanks for the summary, I probably felt like the attendees when I was reading about Hamlet’s presentation, as I found it pretty intriguing…darn time limits!