Which Keywords on a Page?

The question of how many keywords you should optimize for on a page is nearly as old as the idea of keywords being on a page in the first place, and there still really isn't a perfect answer.

However, hopefully I can give you some guidance. First, I need to explain some concepts.

First, a keyword (or key phrase) is just that - a word or group of words that you type into a search engine with the expectation of obtaining a result. The generic word for both keyword and keyphrase is "search term". Some people also call this a query (and they are correct semantically) but I prefer to reserve the term query for search terms that explicit operators are applied to, such as "new +york" which looks for "new" but only if "york" is also on the page.

             / Keyword = "dog"
Search Term =
\ Keyphrase ="dog breeder"

Query (or search query) = "dog +breeder site:cnn.com"

There is no final authority on how to use these terms but this is how I use them, at least formally. I've been known to get sloppy and use "keyword" as a catchall for everything you type into a search box, like most people. In this article I'll try to use them properly.

The second term that is important to know is Term Vector. I won't go into extreme detail here, but for simplicities sake I can describe term vectors as words that support search terms.

For example, let's say your search term is "java". A search engine has no idea if you are talking about Java the island, Java the programming language, or java as slang for coffee, so it will generally guess based on link popularity, which will result in mostly computer related references.

However, if you give it a hint, like "java travel", you will get sites related to travelling to java. Great. But how are the pages chosen in the first place?
Here are some term vectors for a few pages on java:

  • Java Page 1- programming, Sun, technology, developer
  • Java Page 2 - island, travel, beach, Indonesia, photos
  • Java Page 3 - coffee, mug, bitter, drink, aroma

Can you tell which Java s about what topic just from looking at the term vectors? Of course. So can search engines. This is one reason why search engines use term vectors as part of their algorithms. They can then use the combination of search terms and term vectors to assign an initial sorted result, before a final sorting using authority indicators like links, age, and so forth.

If you show up well in the initial result, then it's easier to do well during the final sorting, all other things being equal. This is why keywords and their placement are still necessary in this day and age of link authority. The link sorting can only act on results that have already been collected and sorted due to their content relevancy. Yes, content (keywords and term vectors) still matters.

Got that? Good. Now we can get to the meat of this overly long post.

For very competitive terms, often you will usually want to optimize only one term per page, in order to maintain focus. However, this is not always practical or desirable. If you have keywords that are highly related or variations of each other, it's hard to make a page for each without looking like a spammer. "Uh, let's see... I'll make one page each for "Buy Viagra", "Purchase Viagra". "Viagra Buy", "Buy Viagra Online"..." Yeah, right. Like that doesn't stick out like a sore thumb. And ruin any possible credibility. It looks so bad that the usual method of "fixing" this is to cloak these pages. But since two wrongs rarely make a right, this is generally a short term fix, if it works at all. I don't recommend it.

So, you have decided to combine a few terms on a page. Generally, the accepted number of keywords you can optimize for on one page is between 2-5. Any more than that and you simply lose focus. The actual number is bound by what you can use naturally on a page, rather than a specific formula. The actual "formula" are the rules of grammar and communication, not mathematics.

The search engine looks at tons of pages related to your keyword, and if they all use words, terms and term vectors one way and you are stuffing them in another, good things will not happen. In short, the search engine is comparing your page to other pages written by humans, not to some internal formula invented by a machine. Therefore, write like a human, not a machine.

So, what does this all imply? Here is the BIG IDEA from this.

What would happen if you tried to optimize a page for coffee, island hopping and programming all at once? Do you think you'd be successful for anything other than the longest of long tail terms? How would a search engine know that you were trying to do this? After all, you'd just be using "java" a lot. Isn't that enough? Isn't the page now optimized for "java"?

No. The reason is that you may be optimizing for "java" but you are hopelessly messing up the term vectors that the search engine uses to decide it's confident about what the topic of your page is. You've lost it.

Let's take this concept a bit further. We've established that term vectors or supporting words are important to the context of a search term and therefore the relevance of a page. So what?

So that means that you should only optimize for more than one keyword on a page if all the keywords have identical or highly similar term vectors, or they are actually term vectors for each other.

Never try to optimize for "lawyer" and "doctor" on the same page as two different keywords. If you are trying to optimize for "the doctor of a lawyer" long tail term, that's fine. But if you hope to stand a chance in hell of showing up for either ,they need to share term vectors. In old fashion terms, they need to be related.

But it's more than just being related. Lawyers and doctors are technically related terms because they are both professionals. You could probably find some sort of relationship between almost any two terms. That's not enough for a search engine to work with. A search engine will decide that terms are related using semantic co-occurrence, which means they keep showing up on the same pages/paragraphs together. Semantic co-occurrence is the basic building block of term vector analysis.

So you should not optimize for "related" search terms on a page, which is too vague. You should optimize for search terms that either very frequently show up together normally (i.e. each is a term vector for the other) or that have a nearly identical term vector space between them.



stoney said...

Ian, we've had a lot of success optimizing a page for a single core term with a plethora of qualifiers. So if we were going after "lawyer" then some qualifiers might be "divorce lawyer", "family lawyer" "kid lawyer", etc. Of course, lawer is a bit general so we might do "divorce lawyer" and a handful of qualifiers that come up in our keyword research.

Ian McAnerin said...

Hi stoney!

I've done a fair amount of work in the legal area and it's true that since many of the pages are selling a legal service, you can put terms like "kid", "divorce", etc.

If you test it, though, you will almost certainly find that pages that are related to an area of law that use similar term vectors will do better.

For example, group qualifiers like kid, family, and divorce on one page, and terms like real estate, contract and company law on another.

Term vectors for family law include things like "parent", "divorced", "child" and so forth, and these are shared by "kid lawyer" and "divorce lawyer".

Likewise, term vectors for "corporate law" tend to be things like "company", "state", and "contract". Use of these terms on a page about divorce law messes up your message, to both the reader and the search engine.

If you look, you will probably find that you have naturally combined these similar concepts already on the site, simply because it seemed logical to do so at the time. This is why search engines use TVA - it mirrors what humans do naturally.

If you are mixing up terms with very different term vectors, I recommend you try separating them into groups that use similar TV's and take a look at the results. As long as you don't mess up PR or something in the process, I think you will be pleasantly surprised at the result.



Anonymous said...

Great example especially on the lawyer term vectors. Are term vectors and semantic mapping the same idea?

Ian McAnerin said...

Dave, it's very similar to semantic mapping, in that you'll see that most of the words are related. SM would be a good start.

However, TVA takes it a step further. Do you know whatg one of the term vectors for "online pharmacy" is? "800".

Why 800? Because reputable pharmacies usually have a 1-800 number on their pages and it's common enough to become a term vector. This is not something that SM would probably let you guess at.

Unfortunately, there are no publicly available tools that can give you true term vectors, so semantic mapping is probably as close as you can get without a custom utility - but double check your terms against the real content of sites that show up well for the term.