Robots.txt Syntax Checker

I was checking the robots.txt file of a client today, and it was far more complicated than normal, so I went looking for an automated syntax checker.

I found this one from Motoricerca, (an Italian SEO company) and can recommend it: Robots.txt Syntax Checker (it's in English).

I like it so much I'm adding a link to it from my robots.txt generator.


Interesting Antispam Tactic: Bot Bomb

My son Tas plays an online game called Mabinogi and today mentioned to me a new feature in the game, which had been overrun by something called bots.

Bots are basically programs that control player characters within the games, taking over for the person who is supposed to be doing it. There are many types of bots, including ones that do repetitive tasks so they player doesn't have to, and the worst, spam bots.

These basically stand in the middle of places where people congregate and shout out advertising (typically game money that in turn was gained by bots, sold for real world money). This is against the game rules and makes it no fun for players who are actually trying to play the game.

In order to combat this (which was ruining this and many other online games) the creators of the game did all they could from a game security perspective, but it was simply too difficult to keep up with and detect the spammers, which went to great lengths to simulate human behavior well enough to fool a computer.

But not a human. Experienced human players can detect bots (and spam) almost instinctively.

The game designers finally decided to try something different: using humans to detect bots. Within the game, they created something called a "bot bomb" which can be thrown at a suspected bot. The "bot bomb" then asks a very simple question, that any human player would be able to answer with no difficulty. Essentially a reverse Turing test, like a CAPTCHA.

If the bot fails, it's logged out and the account flagged. Time limits are in place to prevent humans from being "bot bombed" by being hit with so many that they can't respond properly.

Why is this on my blog? Because game theory is a very important aspect to how the modern web functions. There are more similarities to dealing with search and SEO as an online "game" than with offline human to human behavior, to the ongoing annoyance of search engines.

It occurs to me that a method that accurately and easily harnesses humans as spam detectors, while not overloading the system with unmanageable amounts of spam reports (and fake spam reports) is something that can be learned from by website owners and search engines alike.

Use humans to detect the spam, but use the system to verify it independently, in order to minimize false or malicious reports.



New Use for Canonical Tag - Geolocation.

I've been messing around with it, and I believe that I've found a new use for the Canonical tag - geolocation of gTLD's.

Simply park (not redirect) a ccTLD on your site, upload an HTML sitemap that points to all your pages with the ccTLD, then place the canonical tag with the gTLD on your pages.

Viola - the search engine "tags" your pages as geolocated via the ccTLD, but only displays the gTLD.

You keep your .com, but have now geolocated your site without having to host it locally.

This is a variation on a technique I've been using for a while, but much cleaner due to the addition of the canonical tag.