How to Create a robots.txt File
What is this?
The robots.txt file is a plain text file at the root of your website that tells search engine crawlers which pages they should and should not access. While it is not a security mechanism (crawlers can ignore it), well-behaved bots respect its directives.
Why it matters
- SEO: Helps search engines crawl your site efficiently by avoiding irrelevant pages
- Crawl budget: Prevents search engines from wasting time on admin pages, search results, and duplicates
- Professionalism: Every production website should have a robots.txt
How to fix it
Create a file named robots.txt in your site's root directory (accessible at https://example.com/robots.txt):
Basic robots.txt
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml
Block specific paths
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /api/
Disallow: /search?
Sitemap: https://example.com/sitemap.xml
WordPress
WordPress generates a virtual robots.txt automatically. To customize it, add to functions.php:
add_filter('robots_txt', function($output, $public) {
$output .= "Disallow: /wp-admin/\n";
$output .= "Disallow: /wp-includes/\n";
$output .= "Sitemap: https://example.com/sitemap.xml\n";
return $output;
}, 10, 2);
Nginx: serve a static file
Place robots.txt in your web root, or configure a location block:
location = /robots.txt {
return 200 "User-agent: *\nAllow: /\nSitemap: https://example.com/sitemap.xml\n";
add_header Content-Type text/plain;
}
Common mistakes
- Accidentally blocking your entire site with
Disallow: /. This prevents all search engine indexing. - Blocking CSS and JavaScript files that Googlebot needs to render your pages.
- Using robots.txt to hide sensitive content. It is publicly accessible and not a security tool.
Test your fix
After creating your robots.txt, audit your site on BeaverCheck to verify it is detected in the Infrastructure tab.