The Sandbox, Confidence, and SEO

The key to SEO (any type of SEO, not just sandbox avoidance) isn't links, or hilltops, or content or even trust. Trust is the closest - I just don't like the word "trust" used in conjunction with a search for a really bad site, for example.

No, the holy grail, in my opinion, is confidence.

The more a search engine can be confident that the result it supplies to you is what you are looking for, the more likely you are going to be supplied with that result - i.e. the higher the site will rank.

Things like links, and content, and authority and all that stuff are just methods of attempting to ascertain how confident a search engine can be in presenting the site.This may seem obvious or trite to someone not used to thinking things through very deeply, but put down the eggnog and indulge me for a moment...

Stop thinking about links and content. What else would inspire confidence in a website? What about the lack of duplicate data? What about a URL structure that lets the search engine know that it's definitely not indexing the same 15 pages over and over again?, what about a server NOT going down all the time? What about links outwards to sites that are known to be useful to searchers for the content they've just searched on? What if the site approaches the search term from a different angle than most of the other sites (ie it's a museum or directory rather than a commercial site, etc)?

What about how long people link to it? A site that people link to for 2 months and then stop is probably not a good site (and probably buying or trading for them, or doing some sort of serial linking campaign). A site that has static 4-year-old links from trusted authority sources is probably a good site.

All of these things can affect the confidence levels a site has as a result for a particular query, or for a position on a results page for a particular query.Of course, these are usually not yes/no answers - if you only rate a 46% confidence level for a keyword, that kind of sucks (I'm making these numbers up for illustration ONLY), but if the other choices are all 22% or lower, then you will be firmly placed in a top position, even though frankly it's not that great of a site. Just because a site is number one doesn't mean it's a good site, it's just considered the best of a bad lot.

I want you to look at something - find a site that is in a sandbox, and look at a keyword that it ranks for. Now look for the closest Supplementary Result. See a connection? Now think about what Supplementary Results are, and what that connection means. Look really, really close.



Why is my site labeled "Supplemental"?

Supplemental sites are part of Google's auxiliary index. We're able to place fewer restraints on sites that we crawl for this supplemental index than we do on sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.The index in which a site is included is completely automated; there's no way for you to select or change the index in which your site appears. Please be assured that the index in which a site is included does not affect its PageRank.
Source: http://www.google.com/intl/en/webmasters/faq.html#label

Sandboxed sites usually appear immediately above supplementary results. If there are no displayed supplementary results for a search (because there are so many other ones that the search engine can show instead), your site probably won't show up.The Supplementary Results are a separate database of "last gasp, only show if nothing else works" results. They have a confidence score (else they would not show up at all), but it's extremely low. These include pages that either go down a lot, or that have been recently not found but used to be good, etc. In short, they are on topic, but there is almost no confidence in them.

I've noticed that "sandboxed" sites typically are sites whose confidence score is very low, but better than the ones in the supplementary results database (I suspect that they are the lowest or bottom results in the normal database). That's a fairly accurate method to tell if something's been sandboxed. Find it's relation to the Supplementary Results for that search term. It's not the only method, but it's quick and and easy.

The sandbox has nothing to do with trust or age, or ccTLD - it's all about confidence, IMO. If you want to declare all sites that have very low confidence ratings as "sandboxed", then fine. For me, they are just sites that the search engine isn't confident about (yet).

It's perfectly possible (even common) for a site to be highly relevant, but not be assigned a high confidence level due to other factors.

IMO, the sandbox effect is related mostly to the length of time a domain has had particular links to it for. Which is actually very different from site age itself. An old site with no links to it will be "sandboxed" based on the first day new links are discovered. Likewise, an established site that resets it's historical data through a redirect, merge, or change in ownership/direction will often suffer the same effect.

Since the links age is only one criteria, a site that can show itself to be trustworthy because of other factors (ie really, really good links, etc) would override the negative aspect of the young links.It appears you need links for about 6 months before Google begins to be confident that they are permanent links and gives you full credit for them. In short, you need at least 6 months of historical data. Since it usually takes 1-3 months for a new site to be fully spidered, you will note that the most common "sandbox" times are 6 + (1-3), or 7-9 months. It could be as soon as 6 months and one day, or as late as 12 months, but I most often see 7-9 as the common range for a standard (non-aggressive but competent) sites.

A brand new site launched by a very trustworthy company, or a site that has garnered lots of natural links, may easily be deemed as a site a search engine can present as a result with confidence, regardless of the youth of it's links. Young links are only one aspect of the whole thing, that's why (IMO) there are so many exceptions to the so-called "sandbox".

You can also avoid the effect if the site is assigned some of the historical data of another via a merge of some sort.

My suggestion for SEO in 2006 - make your site one that a search engine could show with complete confidence to a searcher for your term. Make sure its technology is sound, it's links trustworthy and it's content useful. If that sounds like what the search engines have been preaching all along, it's because it is - they are just finding different ways of measuring it.

Of course, I'm sure some people's response to all this will be along the lines of the old joke: "The secret to success is sincerity - once you can fake that, you've got it made!" I beg to differ, of course - tricking people (and search engines) is a bad habit, and almost always backfires.

My opinion,

Ian

1 comment:

Stoney said...

Lots of good thoughts here Ian. I like the term confidence. We usually speak in terms of "authority" and whatnot but I think confidence more accuratly describes what search engines really want. Authority is merely a result of a path of establishing confidence.