WikiPlus

Gerador de Robots.txt

Gere arquivos robots.txt para gerenciar a indexação. 100% grátis, funciona no seu navegador.

Processamento local
1.4s em média
4.8 de 5 — com base em 1,247 usos

Por Sergio Robles — Fundador

User-agent: *
Allow: /
Seus arquivos são processados localmente no seu navegador. Nunca enviamos ou armazenamos seus dados.

O que é Gerador de Robots.txt?

O Robots.txt Builder cria um arquivo robots.txt valido para o seu site. Ele cobre todos os bots principais: Google, Bing, GPTBot, ClaudeBot e mais. Voce tambem pode adicionar qualquer nome de bot personalizado. Times de SEO criam regras especificas para cada cliente. Lojas online bloqueiam URLs de filtro que desperdicam tempo de rastreamento. Times de politica de IA escolhem quais bots podem treinar com seu conteudo. A ferramenta verifica wildcards invalidos e remove linhas duplicadas. Tambem adiciona o link do sitemap. A saida aparece no seu navegador, pronta para copiar ou baixar. A estrutura do seu site e regras de rastreamento ficam privadas ate voce publicar.

Quando devo usar esta ferramenta?

  • Bloquear crawlers dos diretórios de staging ou admin de um site
  • Permitir o Googlebot mas bloquear user agents agressivos de scraping de SEO
  • Declarar localizações de sitemap para uma descoberta mais rápida pelos buscadores
  • Definir regras de crawl-delay para proteger um host compartilhado com poucos recursos

Como gerar um arquivo robots.txt?

  1. 1Adicione regras de user-agent para Googlebot, Bingbot ou um wildcard global.
  2. 2Digite caminhos allow e disallow para cada grupo de user-agent.
  3. 3Adicione crawl-delay e URLs de sitemap opcionais ao final.
  4. 4Visualize o robots.txt gerado no painel de saída ao vivo.
  5. 5Baixe o robots.txt e envie para a raiz do seu site.

Perguntas frequentes

O que é um arquivo robots.txt e onde ele deve ficar?

Robots.txt is a plain-text protocol file that follows the Robots Exclusion Standard, originally defined in 1994 and formalized by Google, Bing, and others. It must be placed at the exact root of your domain — accessible at yoursite.com/robots.txt with no subdirectory, no redirect, and no authentication. Search engine crawlers fetch this URL before crawling any other page on the domain. The file contains one or more User-agent blocks that identify specific crawlers by name, followed by Allow and Disallow directives that tell those crawlers which URL paths they may or may not fetch. A wildcard User-agent: * block applies to any crawler not matched by a more specific block. A Sitemap directive at the bottom of the file provides the absolute URL of your XML sitemap, helping crawlers discover all indexable URLs efficiently without exhaustive link-following. Robots.txt is not a security mechanism. It is a polite protocol, and compliant crawlers honor it. Malicious scrapers, vulnerability scanners, and spam bots routinely ignore it. Do not rely on robots.txt to hide sensitive content — use server-side authentication or firewall rules for genuine access control. Every major search engine — Googlebot, Bingbot, DuckDuckGo's DuckDuckBot, Yandex, Baidu, and the major AI crawlers — respects robots.txt. Google's robots.txt parser also enforces a file size limit of 500 KB; content beyond that limit is ignored. The WikiPlus Robots.txt Generator writes syntactically valid output verified against Google's published parsing specification. Download the file and upload it to your site's web root via FTP, your CMS media manager, or your deployment pipeline.

Devo bloquear crawlers de IA como GPTBot e ClaudeBot?

This is a genuinely contested decision in 2025 and the right answer depends on your site's business model and content strategy. The case for blocking: GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, CCBot (Common Crawl, which underlies many AI training sets), and Amazonbot are the primary vectors through which your content enters AI training datasets and live AI assistant responses. If you operate a subscription paywall, a licensed news archive, a premium recipe site, or any business where the content's scarcity is the value proposition, allowing these crawlers to harvest and reproduce your content in AI responses undercuts your distribution model and may raise copyright concerns. The case for allowing: AI-powered search surfaces — Google AI Overviews, Bing Copilot, ChatGPT Browse, Perplexity, and Claude — are now where a growing segment of users begin their information journey. Being cited or referenced in these contexts drives qualified referral traffic and brand awareness. For product sites, marketing pages, documentation, and informational content where broad discovery is the goal, blocking AI crawlers trades citation visibility for training-data protection. The net is often negative. The WikiPlus generator includes pre-configured presets for both stances as well as fine-grained per-bot toggles. You can allow Googlebot fully, allow GPTBot for the citation benefit, and block CCBot to minimize training-set participation — these are independent decisions expressed as separate User-agent blocks in the same file.

Qual a diferença entre Disallow e noindex?

Disallow in robots.txt and noindex in a meta robots tag accomplish superficially similar goals but operate at completely different points in the crawl pipeline, with behavioral differences that determine which one is appropriate for a given situation. A Disallow directive instructs compliant crawlers not to fetch the specified URL at all. The crawler stops at the robots.txt file and never makes an HTTP request to the disallowed path. Because the crawler never sees the page content, it cannot read a noindex tag there, cannot follow links on that page, and cannot pass PageRank through its internal links. However, a disallowed URL can still appear in Google search results as a bare link without a snippet if other sites link to it — the URL is known to exist but its content is invisible. A noindex meta tag works differently. It requires the crawler to fetch the page and read the HTML head. The crawler visits the page normally, follows its links, allows PageRank to flow through those links, and then voluntarily excludes the page from its search index. This is the right approach for thank-you confirmation pages, pagination variants, session-specific filtered views, and internal search result pages — pages you want excluded from SERPs but whose link equity should still flow to linked pages. Disallow is right for admin panels, staging environments, private user dashboards, and any URL you want neither crawled nor cited. Using both directives on the same URL is redundant: a disallowed page is never fetched, so its noindex tag is never read. The WikiPlus generator exposes both mechanisms with a per-path toggle.

Posso permitir uma pasta dentro de um diretório bloqueado?

Yes. The robots.txt specification supports Allow directives that take precedence over a broader Disallow when the Allow path is more specific. The rule resolution algorithm used by Google and Bing compares the length of the matching path: the longer (more specific) path wins regardless of the order in which Allow and Disallow appear within a User-agent block. For example, to block the entire /members/ directory except for the public profile index, write Disallow: /members/ followed by Allow: /members/profiles/. Crawlers will skip all URLs under /members/ except those under /members/profiles/, which are fetched normally. Path matching uses prefix logic: Disallow: /private/ blocks /private/page.html, /private/docs/, and any other URL beginning with /private/. Wildcards extend this with the * character (matches any sequence of characters) and the $ character (anchors the pattern to the end of the URL). For example, Disallow: /*.pdf$ blocks all URLs ending in .pdf anywhere on the site without blocking the directories that contain them. The WikiPlus generator's rule builder validates these patterns in real time and shows you the effective coverage of each rule. It flags common mistakes like Disallow: / (blocks everything) when you intended Disallow: /admin/, and warns when an Allow rule is shadowed by a conflicting Disallow at the same specificity level. After generating the file, verify it using Google Search Console's robots.txt Tester before deploying — syntax errors in robots.txt fail silently from the browser but cause Googlebot to fall back to default crawl behavior.

O conteudo desta pagina esta disponivel sob CC BY 4.0.