Tag Archives: robots.txt

Robots.txt: What Is It? Why Do You Need It?

What is robots.txt?

Robots.txt is a small file that sits on your website.

Well behaved robots – such as Google, Yahoo, Bing, etc – will check the file each time they crawl around your website to make sure that they only index the parts of your site that you want them to view.

Badly behaved robots will ignore it completely but that’s a different story.

Since robots.txt is designed to be used by computers, it has a standard layout that needs to be followed.

You can make your own robots.txt file with Notepad or any other text editor.

At its simplest, you can just allow all robots to crawl anything and everything on your website. To do this, you’d create a robots.txt file with the following two lines:

User-agent: *
Allow: /

TheĀ  first line tells the robots (also known as user-agents) that all of them are allowed in – that’s indicated by the * symbol which is computer-geek for “everyone”.

The second line is shorthand for the whole of your site.

If you didn’t want any robots to index any of your site – maybe because it is a private company site or because you haven’t finished it yet, you’d change the second line to:

Disallow: /

So far, so good. Even a simple robots.txt file like the one above is better than nothing. But chances are there are areas of your site that you don’t want to be indexed. One of the most common areas to have robots banned is a private area on your site, maybe storing some common files that you need access to.

If this folder was called “private” then you’d add the following line to your robots.txt file:

Disallow: /private/

This would keep the search engine spiders out but would almost certainly act like a honeypot for hackers, so you’d need to take other precautions such as password protection to keep them out.

Assuming you’ve generated a sitemap for your site to help with indexing, you can also use robots.txt to tell the search engines where to find that file. Which is why the robots.txt file for this site includes the line:

Sitemap: https://seomax.co.uk/sitemap.xml.gz

Assuming you have a robots.txt file or decide to create one, it’s worth checking that it’s performing how you expect it. The instructions in the file are taken quite literally by the various search engine spiders, so it’s well worth checking that your robots.txt is telling them the right things. I use this free robots.txt checker to ensure that any of these files I create are working as expected.