TopRank Blog

Cool - I made the "BigList" for the TopRank Blog with the following description:

Long time search marketer and ex-attorney Ian McAnerin writes a mix of posts on
SEO, China, search marketing conferences, search engines and a bit of
philosophy.

I also get a badge :) Lee says the link back is totally optional, which is a good way to get me to give him one ;)

I usually don't pay much attention to blog lists, but this one came at a good time (I was feeling kind of down) and it lifted my spirits a bit.

Funny how sometimes little things happen at just the right time...

Big List - Search Marketing Blogs

Ian

Which Keywords on a Page?

The question of how many keywords you should optimize for on a page is nearly as old as the idea of keywords being on a page in the first place, and there still really isn't a perfect answer.

However, hopefully I can give you some guidance. First, I need to explain some concepts.

First, a keyword (or key phrase) is just that - a word or group of words that you type into a search engine with the expectation of obtaining a result. The generic word for both keyword and keyphrase is "search term". Some people also call this a query (and they are correct semantically) but I prefer to reserve the term query for search terms that explicit operators are applied to, such as "new +york" which looks for "new" but only if "york" is also on the page.


             / Keyword = "dog"
Search Term =
\ Keyphrase ="dog breeder"


Query (or search query) = "dog +breeder site:cnn.com"


There is no final authority on how to use these terms but this is how I use them, at least formally. I've been known to get sloppy and use "keyword" as a catchall for everything you type into a search box, like most people. In this article I'll try to use them properly.

The second term that is important to know is Term Vector. I won't go into extreme detail here, but for simplicities sake I can describe term vectors as words that support search terms.

For example, let's say your search term is "java". A search engine has no idea if you are talking about Java the island, Java the programming language, or java as slang for coffee, so it will generally guess based on link popularity, which will result in mostly computer related references.

However, if you give it a hint, like "java travel", you will get sites related to travelling to java. Great. But how are the pages chosen in the first place?
Here are some term vectors for a few pages on java:

  • Java Page 1- programming, Sun, technology, developer
  • Java Page 2 - island, travel, beach, Indonesia, photos
  • Java Page 3 - coffee, mug, bitter, drink, aroma

Can you tell which Java s about what topic just from looking at the term vectors? Of course. So can search engines. This is one reason why search engines use term vectors as part of their algorithms. They can then use the combination of search terms and term vectors to assign an initial sorted result, before a final sorting using authority indicators like links, age, and so forth.

If you show up well in the initial result, then it's easier to do well during the final sorting, all other things being equal. This is why keywords and their placement are still necessary in this day and age of link authority. The link sorting can only act on results that have already been collected and sorted due to their content relevancy. Yes, content (keywords and term vectors) still matters.

Got that? Good. Now we can get to the meat of this overly long post.

For very competitive terms, often you will usually want to optimize only one term per page, in order to maintain focus. However, this is not always practical or desirable. If you have keywords that are highly related or variations of each other, it's hard to make a page for each without looking like a spammer. "Uh, let's see... I'll make one page each for "Buy Viagra", "Purchase Viagra". "Viagra Buy", "Buy Viagra Online"..." Yeah, right. Like that doesn't stick out like a sore thumb. And ruin any possible credibility. It looks so bad that the usual method of "fixing" this is to cloak these pages. But since two wrongs rarely make a right, this is generally a short term fix, if it works at all. I don't recommend it.

So, you have decided to combine a few terms on a page. Generally, the accepted number of keywords you can optimize for on one page is between 2-5. Any more than that and you simply lose focus. The actual number is bound by what you can use naturally on a page, rather than a specific formula. The actual "formula" are the rules of grammar and communication, not mathematics.

The search engine looks at tons of pages related to your keyword, and if they all use words, terms and term vectors one way and you are stuffing them in another, good things will not happen. In short, the search engine is comparing your page to other pages written by humans, not to some internal formula invented by a machine. Therefore, write like a human, not a machine.

So, what does this all imply? Here is the BIG IDEA from this.

What would happen if you tried to optimize a page for coffee, island hopping and programming all at once? Do you think you'd be successful for anything other than the longest of long tail terms? How would a search engine know that you were trying to do this? After all, you'd just be using "java" a lot. Isn't that enough? Isn't the page now optimized for "java"?

No. The reason is that you may be optimizing for "java" but you are hopelessly messing up the term vectors that the search engine uses to decide it's confident about what the topic of your page is. You've lost it.

Let's take this concept a bit further. We've established that term vectors or supporting words are important to the context of a search term and therefore the relevance of a page. So what?

So that means that you should only optimize for more than one keyword on a page if all the keywords have identical or highly similar term vectors, or they are actually term vectors for each other.

Never try to optimize for "lawyer" and "doctor" on the same page as two different keywords. If you are trying to optimize for "the doctor of a lawyer" long tail term, that's fine. But if you hope to stand a chance in hell of showing up for either ,they need to share term vectors. In old fashion terms, they need to be related.

But it's more than just being related. Lawyers and doctors are technically related terms because they are both professionals. You could probably find some sort of relationship between almost any two terms. That's not enough for a search engine to work with. A search engine will decide that terms are related using semantic co-occurrence, which means they keep showing up on the same pages/paragraphs together. Semantic co-occurrence is the basic building block of term vector analysis.

So you should not optimize for "related" search terms on a page, which is too vague. You should optimize for search terms that either very frequently show up together normally (i.e. each is a term vector for the other) or that have a nearly identical term vector space between them.

Ian

Search Friendly PDF

I've been asked several times about PDF files (like those made with Adobe Acrobat) and SEO.

My advice? Use MS Word (or Wordperfect) not Acrobat. For reasons that I can only describe as being stupid to the extreme, only the latest (and really expensive) versions of Acrobat save links within the PDF document. And not if you use the printer driver function (aka PDFMaker), which is what most people use. Only the latest Acrobat Distiller lets you do this, and it's slow. I've also had formatting and crashing issues with Distiller.

If you have no links in your document, it gets considered to be a dead end or honeypot by the search engines. Not good.

Also, if you take an image or scan and make it a PDF, then it's a PDF of an image or scan, so it's not spiderable except as a file name (just like an image).

You can test to see if a PDF is from an image or a text file if you can load it into Acrobat reader and highlight the text and copy it. If you can't, it's probably a picture of text, and is not spiderable.

There is another twist. If you make a document with links in it, then turn that document into a PDF using the PRINTER function (which is usually how Acrobat and other related PDF makers do things) then all the links are lost. They are lost as soon as the file prepared to be sent to the printer device.

Oh, they will show up in the Acrobat Reader as a clickable link if the whole link exists (but not if it has anchor text), but this is the reader turning it into the link, not an actual link in the document that a spider could follow.

The only way to create a PDF that is indexable as text and has real links with anchor text (in short, the only SEO-friendly method) is to use a method where the links are processed within the document, not on the way to the printer. This is usually the case where, for example, you create the PDF using the "Save as" feature rather than the "Print to" feature.

This is built into MS Word/Office 2007 and above (if you download the free PDF/XPS plugin) and in the WordPerfect Suite. There are a couple of other options, as well. But most are not SEO-friendly.

The following software WILL convert/keep MS Word links during PDF conversion:

  • Office 2007 (with plugin)
  • WordPerfect Office (version 9 and above)
  • Click2Convert
  • createpdf.adobe.com (but ONLY if you check "Create Tagged PDF" in the non-free version)

But NOT

  • Adobe Acrobat (any version up to 7)
  • Primopdf
  • PDFConverter

Final Takeaway

  1. PDF from image = bad
  2. PDF from printer driver = bad
  3. PDF processed within the text editor and saved = good.
Ian