What Is Robots.txt?
The robots.txt file is a text document used to give directives to search engine bots and spiders on how to crawl and index pages on a website.
Why Is Robots.txt Important?
Robots.txt is an important tool for a website to use and it has a few different functions.
It is an effective way of controlling crawl budget. By blocking certain sections of a website from being crawled, Google crawlers can direct their efforts to more important sections of your website.
It is also important to use robots.txt in order to prevent internal search results pages from being indexed, or similarly other pages that you don’t want to be seen in SERPs like login pages.
Robots.txt is also useful if you need to hide pages that are under construction so that your audience won’t see them until they’re ready.
What Should You Watch Out For?
While robots.txt is a useful tool, it does have its disadvantages.
- While it enables webmasters to stop certain pages from being crawled, this doesn’t necessarily prevent the URLs from appearing in SERPs. For this, you should use a noindex tag.
- Using a robots.txt on a web page also prevents the spread of link equity on that page.
- Additionally, if a website’s security is not up to scratch attackers can use robots.txt to discover private data.
What Are Some Robots.txt Best Practices?
There are a few other things that you need to take into consideration when choosing to use robots.txt.
- Pay close attention when making changes to robots.txt, one small mistake can have a big impact and make certain sections of your site unindexable
- Don’t use robots.txt to block sensitive data from SERPs because it can still be indexed, use a noindex tag instead
- Ensure that you’re placing robots.txt on the right sections of your website, you don’t want to block essential pages from being indexed
- Make sure to add your sitemap’s location to robots.txt
- To ensure that your robots.txt is discoverable, put it in your website’s root directory
When Should You Avoid Using Robots.txt?
Robots.txt is not the go to tool for every situation. There are some circumstances when you should avoid using it.
- You may have heard that duplicate pages can be fixed with a robots.txt, however this is wildly outdated. You should instead use a canonical tag. This will allow you to keep the duplicate page and preserve link equity.
- If a web page is no longer in use, then you shouldn’t be using robots.txt to remedy the situation. Instead, use a 301 redirect in order to send your users to the right web page.
- In a situation where a website doesn’t want its web page to appear in SERPs but wants to preserve link equity, opt for a noindex tag instead of a robots.txt.
How Can We Help?
Still confused? Get in touch and we’ll help you clarify with a free strategy plan.
In the meantime, you could also check out our training courses to give you some essential pointers and practises to becoming a pro at SEO.