As part of our research for our post on how we block search engines, we looked into which search engines support which privacy standards. This information doesn’t seem to exist anywhere else on the Internet, so below are our findings, starting with the big guys, and moving towards more obscure or foreign search engines.
Yahoo!’s search engine is provided by Bing. AOL’s is provided by Google. These are easy ones.
Ask, Yandex, Nutch
Ask (known as teoma), and Yandex (Russia’s search engine, known as yandex),
support the robots meta tag, but do not appear to support the
Ask’s page on the topic is here, and Yandex’s is here. The popular
open source crawler, Nutch, also supports the robots HTML tag, but
x-robots-tag header. Update: Newer versions of Nutch now
The Internet Archive, Alexa
The Internet Archive uses Alexa’s crawler, which is known as ia_archiver. This
crawler does not seem to support either the HTML robots meta tag nor the
x-robots-tag HTTP header. Their page on the subject is here. I have
requested more information from them, and will update this page if I hear back.
Blekko does not support either the robots meta tag nor the
x-robots-tag header, per emails I’ve had with them. I also requested
information from Baidu, but their response totally ignored my question and was
in Chinese. They do have some information here, but it does not seem to
provide any information on the noindex value for the robots tag. In any case,
the only way to block these crawlers seems to be via a robots.txt file.
I previously stated that DDG did not support the
x-robots-tag header, but
while that was true, it didn’t tell the entire story. The entire story is that
DDG uses other search crawlers for their content aggregation and uses their
own crawler only for maintenance-type work. You can read more about this in my
answer on StackOverflow.
I love getting feedback and comments. Make my day by making a comment.