Knowledgebase

Back to Tools and Features

How To Limit What Search Engines Can Index


Search engines will crawl your website searching for information to include in search engines. This article contains information about limiting search engines and what they can index.

Limiting what search engines can index using a /robots.txt file. 

Various search engines such as Google have what are called "spiders" or "robots" continually crawling the web indexing content for inclusion in their search engine databases. While most users view inclusion in search engine listings in a positive light, and high search engine rankings can translate to big bucks for commercial sites, not everyone wants every single page and file stored on their account publicly available through web searches.

This is where /robots.txt comes in. Most search engine robots will comply with a webmaster/site owner's wishes as far as excluding content by following a robot's inclusion standard, which is implemented via the use of a small ASCII text file named /robots.txt in the root web-accessible directory of a given domain.

When a compliant robot visits a given site the first thing it does is to check the top-level directory for the presence of a file named "robots.txt". If found, these directives within the file tell the robot what if any content it can or cannot visit, and the index is read, and in most cases honored.


Creating /robots.txt files


To create a /robots.txt file simply open a plain text editor such as Windows NotePad, type or paste your directives, and save the file using the file name "robots" (robots.txt). This file should then be uploaded to the /public_html directory such that its URL will be http://domain.com/robots.txt



Related Articles

How To Delete Your Content
Do I Use Localhost or RemoteMYSQLhost For A MYSQL connection
How To Backup My Site
How To Create A Redirect For My Site
How To Download Raw Access Logs

Can’t Find what you need?

No worries, Our experts are here to help.