Is Google and the other search engines not including your website in their index? If you have optimised your website but you aren’t getting any visitors to your website from the search engines, one area that you can look for the cause of this could be your robots.txt file found in the root of your website.

I was checking one of my client’s websites that had no pages indexed by Google, but had about 2000 pages indexed on Yahoo. This website is not an old one and has been running for about 5 years. The website was old, but they asked me to investigate why they are not listed on Google’s index.

The website had some issues as a lot has changed in terms of SEO over the last few years, but when I looked at the robots.txt file I found this:

User-agent: *
Disallow: /

When you specify “User-agent: *” it basically applies to all search engine spiders and robots. The “Disallow: /” tells the agents and spiders they should not visit and crawl any page on the website. This is a big no no to disallow the whole website and all the directories.

It seems that the designer who built the website excluded all the pages unknowingly but never checked when the website was finished. We had to rewrite the robots.txt file to only disallow certain directories and pages of the website, but not the whole site.

What is a robots.txt file?

This file contains simple rules for crawlers visiting your website. This is a handy little file that is interepreted by all the spiders which guides them on which pages to crawl and which ones to leave alone and not visit. On this file you can also specify the location of your XML sitemap for quick access to the spiders. For example:

User-agent: *
Disallow: /wp-admin/
Sitemap: http://www.antonkoekemoer.co.za/sitemap.xml

This robots.txt file specify that the only directory that shouldn’t be crawled is the /wp-admin/ directory which isn’t public. It also contains the default location of the XML sitemap. Even though the disallow was activated to not crawl any pages on the website, It’s funny that Yahoo still indexed those pages.

If you have a website and it’s not indexed by Google, have a look at your robots.txt file.