Page tree
Skip to end of metadata
Go to start of metadata

Search engines will crawl your website searching for information to include in search engines. This article contains information about limiting search engines and what they can index.

Limiting what search engines can index using a /robots.txt file. 

Various search engines such as Google have what are called "spiders" or "robots" continually crawling the web indexing content for inclusion in their search engine databases. While most users view inclusion in search engine listings in a positive light, and high search engine rankings can translate to big bucks for commercial sites, not everyone wants every single page and file stored on their account publicly available through web searches.

This is where /robots.txt comes in. Most search engine robots will comply with a webmaster/site owners wishes as far as excluding content by following a robots inclusion standard which is implemented via the use of a small ASCII text file named /robots.txt in the root web accessable directory of a given domain.

When a compliant robot visits a given site the first thing it does is to check the top level directory for the presence of a file named "robots.txt". If found, these directives within the file which tells the robot what if any content it can or cannot visit and index is read, and in most cases honored.


Creating /robots.txt files


To create a /robots.txt file simply open a plain text editor such as Windows NotePad, type or paste your directives and save the file using the file name "robots" (robots.txt). This file should then be uploaded to the /public_html directory such that it's URL will be http://domain.com/robots.txt

There is no content with the specified labels