Support for x-robots-tag and robots HTML meta tag

As part of our research for our post on how we block search engines, we looked into which search engines support which privacy standards. This information doesn’t seem to exist anywhere else on the Internet, so below are our findings, starting with the big guys, and moving towards more obscure or foreign search engines.

Google, Bing

Google (known as Googlebot) and Bing (known as Bingbot) support the x-robots-tag header and the robots HTML tag. Here’s Google’s page on the topic. And here’s Bing’s. The msnbot is retired.

Yahoo, AOL

Yahoo!’s search engine is provided by Bing. AOL’s is provided by Google. These are easy ones.

Ask, Yandex, Nutch

Ask (known as teoma), and Yandex (Russia’s search engine, known as yandex), support the robots meta tag, but do not appear to support the x-robots-tag. Ask’s page on the topic is here, and Yandex’s is here. The popular open source crawler, Nutch, also supports the robots HTML tag, but not the x-robots-tag header. Update: Newer versions of Nutch now support x-robots-tag!

The Internet Archive, Alexa

The Internet Archive uses Alexa’s crawler, which is known as ia_archiver. This crawler does not seem to support either the HTML robots meta tag nor the x-robots-tag HTTP header. Their page on the subject is here. I have requested more information from them, and will update this page if I hear back.

Blekko, Baidu

Blekko does not support either the robots meta tag nor the x-robots-tag header, per emails I’ve had with them. I also requested information from Baidu, but their response totally ignored my question and was in Chinese. They do have some information here, but it does not seem to provide any information on the noindex value for the robots tag. In any case, the only way to block these crawlers seems to be via a robots.txt file.

Duckduckgo

I previously stated that DDG did not support the x-robots-tag header, but while that was true, it didn’t tell the entire story. The entire story is that DDG uses other search crawlers for their content aggregation and uses their own crawler only for maintenance-type work. You can read more about this in my answer on StackOverflow.

I love getting feedback and comments. Make my day by making a comment.

Comments

Support for x-robots-tag and robots HTML meta tag

Google, Bing

Yahoo, AOL

Ask, Yandex, Nutch

The Internet Archive, Alexa

Blekko, Baidu

Duckduckgo

Published

Category

Tags

Contact

This is Reader-Editable

Google, Bing

Yahoo, AOL

Ask, Yandex, Nutch

The Internet Archive, Alexa

Blekko, Baidu

Duckduckgo

Published

Category

Tags

Contact

This is Reader-Editable

Get Weekly Updates