A robots.txt file is located in the root directory of your site to recommend access restrictions to site crawlers. You have to use the Robot Exclusion Standard protocol to write the file. The protocol is easy to understand and the robots.txt is a normal text file. So you don’t need any special software or training to create a robots.txt file.
Usefulness of robot.txt
The robots.txt file can be used for the following purposes:
1. Saves Resources and Boasts Ranking
The robots.txt file tells site crawlers to avoid certain files or folders. It doesn’t hide the resources. It just requests the crawlers not to read them. To make the resources invisible to search results, you need to use password protection or noindex tag option. When crawlers follow the directives, the robots.txt file works like a traffic controller. If certain pages are not traversed by the crawlers, it can save time and computing resources. The robot.txt file can also help to exclude material that you don’t want to be indexed by the search engines. This helps with SEO ranking by keeping your site indexed for relevant keywords only.
2. Prevent Images from Search Results
If you disallow certain images on robots.txt, those images will not show up in Google search. However, other users will be able to directly link to the image.
3. Controlling Crawlers
The robots.txt file can be used to let only selective crawlers analyze the site. This kind of control is mainly pursued to save bandwidth.
Limitations and Downsides of robots.txt
There are some limitations and downsides to robots.txt. Here are a few:
1. Enforcement is Voluntary
The robots.txt is only a recommendation. It’s up to the crawlers how the information is enforced. Generally, the respectable bots like Googlebot and Bingbot are good at following the recommendations. But less reputable or new crawlers in the space might ignore the directives and crawl the pages anyway. You should password protect the pages you want out of the public sphere.
2. Crawler Interpretation Can Vary
Different crawlers might interpret the instructions differently. This can lead to inconsistencies.
3. Complicated Instructions Can End Up Harming Your Indexing
Setting up complicated instructions can end up confusing crawlers from analyzing your site properly. Try to keep the rules simple so you don’t create a situation where crawlers miss out on important aspects of your site.
Example Contents of robots.txt
The robots.txt uses regular expressions to regulate how the files in the site should be treated. Here is an example of robots.txt from a WordPress website:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php
The above code instructs all web crawlers (User-agent) not to visit the /wp-admin/ folder except the admin-ajax.php file in that folder.
You can use the robots.txt file to invite only the crawlers you want:
User-agent: * Disallow: / User-agent: Googlebot Disallow: User-agent: Bingbot Disallow:
In the above code, robots.txt tells all web crawlers except Googlebot and Bingbot to not crawl any of the website’s content. Only these 2 bots are allowed to browse the entire site since the disallow directive has no defined value, meaning that everything can be crawled and indexed.
SEO Best Practices for robots.txt
- Ensure that you aren’t blocking relevant content.
- Avoid using robots.txt for access control. It will not prevent access to sensitive data. You need to use password protection and noindex meta directive for that.
- If you are trying to exclude unnecessary crawlers, make sure that you cover the variations. For example, Google uses Googlebot for general search and Googlebot-Image for image search.
- You can explicitly point to your XML sitemap in your robots.txt. The XML sitemap document helps search engines to better understand the relevance of different pages on your website. Putting an explicit link to your XML sitemap in your robots.txt helps search engines easily find the relevant information. It increases the chances of your website being rated higher.
A robots.txt file is a simple tool that can improve the quality of your traffic and the performance of your site. Mastering this simple tool can also help you with your SEO.