What is Robots.txt Generator?
The Robots.txt Builder makes a valid robots.txt file for your site. It covers all major bots: Google, Bing, GPTBot, ClaudeBot, and more. You can also add any custom bot name you need. SEO teams build custom rules for each client. Online stores block filter URLs that waste crawl time. AI policy teams pick which bots may train on their content. The tool checks for bad wildcards and drops duplicate lines. It adds a sitemap link too. Output shows in your browser, ready to copy or download. Your site layout and crawl rules stay private until you publish.
When should I use this tool?
- Block crawlers from a site's staging or admin directories
- Allow Googlebot but disallow aggressive SEO-scraping user agents
- Declare sitemap locations for faster search engine discovery
- Set crawl-delay rules to protect a low-resource shared host
How do I generate a robots.txt file?
- 1Add user-agent rules for Googlebot, Bingbot, or a global wildcard.
- 2Enter allow and disallow paths for each user-agent group.
- 3Add optional crawl-delay and sitemap URLs at the bottom.
- 4Preview the generated robots.txt in the live output pane.
- 5Download robots.txt and upload it to your site root.
Frequently asked questions
What is a robots.txt file and where does it belong?
Robots.txt is a plain-text protocol file that follows the Robots Exclusion Standard, originally defined in 1994 and formalized by Google, Bing, and others. It must be placed at the exact root of your domain — accessible at yoursite.com/robots.txt with no subdirectory, no redirect, and no authentication. Search engine crawlers fetch this URL before crawling any other page on the domain. The file contains one or more User-agent blocks that identify specific crawlers by name, followed by Allow and Disallow directives that tell those crawlers which URL paths they may or may not fetch. A wildcard User-agent: * block applies to any crawler not matched by a more specific block. A Sitemap directive at the bottom of the file provides the absolute URL of your XML sitemap, helping crawlers discover all indexable URLs efficiently without exhaustive link-following. Robots.txt is not a security mechanism. It is a polite protocol, and compliant crawlers honor it. Malicious scrapers, vulnerability scanners, and spam bots routinely ignore it. Do not rely on robots.txt to hide sensitive content — use server-side authentication or firewall rules for genuine access control. Every major search engine — Googlebot, Bingbot, DuckDuckGo's DuckDuckBot, Yandex, Baidu, and the major AI crawlers — respects robots.txt. Google's robots.txt parser also enforces a file size limit of 500 KB; content beyond that limit is ignored. The WikiPlus Robots.txt Generator writes syntactically valid output verified against Google's published parsing specification. Download the file and upload it to your site's web root via FTP, your CMS media manager, or your deployment pipeline.
Should I block AI crawlers like GPTBot and ClaudeBot?
This is a genuinely contested decision in 2025 and the right answer depends on your site's business model and content strategy. The case for blocking: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, CCBot (Common Crawl, which underlies many AI training sets), and Amazonbot are the primary vectors through which your content enters AI training datasets and live AI assistant responses. If you operate a subscription paywall, a licensed news archive, a premium recipe site, or any business where the content's scarcity is the value proposition, allowing these crawlers to harvest and reproduce your content in AI responses undercuts your distribution model and may raise copyright concerns. The case for allowing: AI-powered search surfaces — Google AI Overviews, Bing Copilot, ChatGPT Browse, Perplexity, and Claude — are now where a growing segment of users begin their information journey. Being cited or referenced in these contexts drives qualified referral traffic and brand awareness. For product sites, marketing pages, documentation, and informational content where broad discovery is the goal, blocking AI crawlers trades citation visibility for training-data protection. The net is often negative. The WikiPlus generator includes pre-configured presets for both stances as well as fine-grained per-bot toggles. You can allow Googlebot fully, allow GPTBot for the citation benefit, and block CCBot to minimize training-set participation — these are independent decisions expressed as separate User-agent blocks in the same file.
What is the difference between Disallow and noindex?
Disallow in robots.txt and noindex in a meta robots tag accomplish superficially similar goals but operate at completely different points in the crawl pipeline, with behavioral differences that determine which one is appropriate for a given situation. A Disallow directive instructs compliant crawlers not to fetch the specified URL at all. The crawler stops at the robots.txt file and never makes an HTTP request to the disallowed path. Because the crawler never sees the page content, it cannot read a noindex tag there, cannot follow links on that page, and cannot pass PageRank through its internal links. However, a disallowed URL can still appear in Google search results as a bare link without a snippet if other sites link to it — the URL is known to exist but its content is invisible. A noindex meta tag works differently. It requires the crawler to fetch the page and read the HTML head. The crawler visits the page normally, follows its links, allows PageRank to flow through those links, and then voluntarily excludes the page from its search index. This is the right approach for thank-you confirmation pages, pagination variants, session-specific filtered views, and internal search result pages — pages you want excluded from SERPs but whose link equity should still flow to linked pages. Disallow is right for admin panels, staging environments, private user dashboards, and any URL you want neither crawled nor cited. Using both directives on the same URL is redundant: a disallowed page is never fetched, so its noindex tag is never read. The WikiPlus generator exposes both mechanisms with a per-path toggle.
Can I allow one folder inside a disallowed parent?
Yes. The robots.txt specification supports Allow directives that take precedence over a broader Disallow when the Allow path is more specific. The rule resolution algorithm used by Google and Bing compares the length of the matching path: the longer (more specific) path wins regardless of the order in which Allow and Disallow appear within a User-agent block. For example, to block the entire /members/ directory except for the public profile index, write Disallow: /members/ followed by Allow: /members/profiles/. Crawlers will skip all URLs under /members/ except those under /members/profiles/, which are fetched normally. Path matching uses prefix logic: Disallow: /private/ blocks /private/page.html, /private/docs/, and any other URL beginning with /private/. Wildcards extend this with the * character (matches any sequence of characters) and the $ character (anchors the pattern to the end of the URL). For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf anywhere on the site without blocking the directories that contain them. The WikiPlus generator's rule builder validates these patterns in real time and shows you the effective coverage of each rule. It flags common mistakes like Disallow: / (blocks everything) when you intended Disallow: /admin/, and warns when an Allow rule is shadowed by a conflicting Disallow at the same specificity level. After generating the file, verify it using Google Search Console's robots.txt Tester before deploying — syntax errors in robots.txt fail silently from the browser but cause Googlebot to fall back to default crawl behavior.
Content on this page is available under CC BY 4.0.