What Is Robots.txt

Robots.txt files have two key roles:

  1. Tell search engines where your XML sitemaps are located.
  2. Tell search engines which areas of your site should not be crawled.

According to Google, robots.txt is a file that tells search engine crawlers which URLs they can access on your site. It is mainly used to manage crawler behavior and avoid unnecessary server load.

What Is a Robots.txt File Used For?

The robots.txt file is primarily used to manage crawler traffic and control which parts of a website are accessible to search engines.

How to View a Robots.txt File

A robots.txt file is always located at the root of your website.

For example, for the site www.lighttangent.com, the robots.txt file can be found at:

https://www.lighttangent.com/robots.txt

A robots.txt file consists of one or more rules. Each rule either allows or disallows access for specific crawlers to defined paths. Unless stated otherwise, all URLs are considered crawlable.

Examples of Robots.txt Files

Here are example robots.txt URLs:

  • http://www.healthaluxury.com/robots.txt
  • https://www.lighttangent.com/robots.txt

What Does a Robots.txt File Look Like?

Below is an example of a basic robots.txt file:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: https://healthaluxury.com/wp-sitemap.xml

How to Create a Robots.txt File (Step-by-Step)

You can use almost any plain-text editor to create a robots.txt file, such as Notepad, TextEdit, or VS Code.

Do not use word processors like Microsoft Word or Google Docs, as they may introduce hidden characters that cause issues for crawlers.

Save the file using UTF-8 encoding.

Format and Location Rules

  • The file must be named robots.txt.
  • Only one robots.txt file is allowed per site.
  • It must be placed at the root of the domain.
  • Rules apply only to the protocol, domain, and port where the file exists.
  • Each subdomain requires its own robots.txt file.
  • The file must be UTF-8 encoded.

A robots.txt file can also exist on:

  • Subdomains (e.g., https://site.example.com/robots.txt)
  • Non-standard ports (e.g., https://example.com:8181/robots.txt)

Source: Google Search Console

How to Check if Your Robots.txt File Is Working

You can test your robots.txt file using Google Search Console:


Check robots.txt in Google Search Console

Is robots.txt Important for SEO?

Google looks for a robots.txt file to understand crawl directives, but it is not mandatory. If you do not need to block any URLs or manage crawl behavior, a robots.txt file is optional.

Is robots.txt Important for LLMs?

A newer concept called llm.txt has emerged, intended to communicate AI usage preferences to large language models. At present, it is not an official web standard (unlike robots.txt), and Google does not use it for rankings. However, it is discussed experimentally within Generative Engine Optimization (GEO) communities.

Leave a Comment