DoÂ you know the importance of a Robots.txt file? Read to know.
SuccessÂ of big companies lies in keeping their confidential data aÂ secret, hidden from all. This enables them to execute theirÂ future course of action easily and change plans accordingÂ to the situation. Job of robots.txt file is the same. It canÂ or cannot allow a search engine to visit some or all of yourÂ web pages. Of course a human visitor is free to visit theseÂ pages. That being the case, for the search engines your websiteÂ may be different than what a visitor is seeing. If you thinkÂ one or some of the pages aren’t good enough to be visitedÂ by search engines you can do it.
Advertisement: If you are serious about earning money online this one video will change your life forever. This guy is a millionaire from Australia who was a broke a couple of years ago. You Must Watch This Video Now!!! Free to Watch. (Will open in new tab.)
EveryÂ search engine has a “robot” (a software program)Â that does the job of visiting a website. Their purpose isÂ to gather a copy of the site and keep them in their database.Â So, if your site is not there in their database it never showsÂ up in the search results.
WebÂ Robots are sometimes referred to as Web Crawlers, or Spiders.Â Therefore the process of a robot visiting your website isÂ called “Spidering” or “Crawling”. WhenÂ somebody says “the search engines have spidered my website”,Â it means the search engine robots have visited their website.Â This robot is known by a name and has an independent IP address.Â This IP address is of no importance to us, but knowing theirÂ names will help since this name will be used when we createÂ a robots.txt file. This is why the file is called “robots.txt.”
Given below is the list of the robots of some of the veryÂ popular search engines:
(uses Inktomi’s robot)
|UKSearcher.co.uk||UK Searcher Spider|
Let’sÂ learn to write robots command. Note that there are two waysÂ to write robots command. One is to include all the commandsÂ in a text file called “robots.txt” and another isÂ to write robots command in the meta tag.
WeÂ will learn both ways of writing robots command.
WritingÂ robots command in Meta tag:
ThereÂ are 4 things you can tell a search engine robot when it visitsÂ your page:
1)Â Do not index this page – the search engines will not indexÂ the page.
2) Do not follow any links on this page – the search enginesÂ will not follow the links included in the page, i.e. theyÂ will not index any page that this page links to.
3) Do index this page – the search engines will index theÂ page.
4) Do follow the links – the search engines will index theÂ pages that this page links to.
NoteÂ that “index” is different than “spider”.Â A search engine first spiders a page and then indexes it. Indexing is giving a certain importance to the page on theÂ basis of its content, information, meta tags, link popularityÂ with respect to the searched keyword. All this is decidedÂ at run time. When you tell search engines not to index a page,Â it means they know that “certain” page exists butÂ do not rank them. That is, a no-index page will never be shownÂ in their search results. This in any case does not mean aÂ no-index page will not get visitors, it might get visitorsÂ indirectly from a page which links to it. Yes, no direct visitorsÂ from the search engines.
SupposeÂ you want the search engines to index and also index (follow)Â its linked pages then include the following command in theÂ Meta Tag:
<metaÂ name=”robots” content=”index, follow”>
SupposeÂ you want the search engines to index a page but not followÂ its links then include the following command in the Meta Tag:
<metaÂ name=”robots” content=”index, nofollow”>
SupposeÂ you do not want the search engines to index a page but followÂ its links then include the following command in the Meta Tag:
<metaÂ name=”robots” content=”noindex, follow”>
SupposeÂ you do not want the search engines to either index or followÂ links of a particular page then include the following commandÂ in the Meta Tag:
<metaÂ name=”robots” content=”noindex, nofollow”>
Google makes a “Cached” of every file it spiders.Â It’s a small snap shot of the page. Want to stop Google fromÂ doing so? Include the following Meta Tag:
<metaÂ name=”robots” content=”noindex, nofollow, noarchive”>
LikeÂ any meta tag the above written tags should be placed in theÂ HEAD section of an HTML page:
<meta name=”description” content=”your description.”>
<meta name=”keywords” content=”your keywords”>
<meta name=”robots” content=”index, follow”>
CreatingÂ robots.txt file:
AÂ robots.txt file is an independent file and should be writtenÂ in a plain text editor like Notepad. Do not use MS-Word orÂ any other text editor to create robots.txt. The bottom lineÂ is this file should have the extension “.txt” elseÂ it will be useless.
Let’sÂ begin. Open Notepad (it comes free with Microsoft Windows)Â and save the file with the name “robots.txt”. MakeÂ sure that the extension is .txt.
ByÂ the way, did you note we did not use name of any robot inÂ the meta tag! What does it indicate? Simple – by using metaÂ you direct all the search engines to do something or not doÂ something on a page. You do not have control over any oneÂ search engine. The solution is robots.txt.
ItÂ can always happen you do not want a particular search engineÂ to index a page for certain reasons. In that case using aÂ robots.txt file will help. Even though I do not recommendÂ such a thing. The search engines get you traffic, why hateÂ them. Stop them from doing their job and they hate you. IÂ again repeat keep your pages smart for the search enginesÂ and welcome them. Fine, then why take the trouble to learnÂ robots.txt? Why should you include a robots.txt file at all?
Let’sÂ suppose yours is a dynamic database site containing informationÂ of your newsletter subscribers, customers, their address,Â phone numbers etc. All these confidential information is keptÂ in a separate directory called “admin”. (It is recommendedÂ to keep such information in a separate directory. Handling data will be easier for you and so will be easy to keep theÂ search engines away. We will just know how.) I am sure youÂ would never want any unauthorized person to visit this areaÂ leave alone the search engines. It does not help the searchÂ engines either since they have nothing to do with the dataÂ or files there. Here comes the role of a robots.txt file.
WriteÂ the following in the robots.txt file:
This does not allow the spiders to index anything in the admin directory also including sub-directories if any.
TheÂ asterisk (*) mark indicates all the search engines. How doÂ you stop a particular search engine from spidering your filesÂ or directory?
SupposeÂ you want to stop Excite from spidering this directory:
SupposeÂ you want to stop Excite and Google from spidering this directory:
FilesÂ are no different. Suppose you want a file datafile.html notÂ to be spidered by Excite:
Similarly,Â you do not want it to be spidered by Google too:
SupposeÂ you want two files datafile1.html and datafile2.html not toÂ be spidered by Excite:
CanÂ you guess what does the following mean?
ExciteÂ will not spider datafile1.html and datafile2.html, but GoogleÂ will not spider only datafile1.html. It will spider datafile2.htmlÂ and the rest of the files in the directory.
ImagineÂ you have a file kept in a sub-directory that you wouldn’tÂ like to be spidered. What do you do? Lets suppose the sub-directoryÂ is “official” and the file is “confidential.html”.
IfÂ the syntax of your robots.txt file is not written correctly,Â the search engines will ignore that particular command. BeforeÂ uploading the robots.txt file double check for any possibleÂ errors. You should upload robots.txt file in the ROOT DirectoryÂ of your server. The search engines look for robots.txt file only in the root directory.
You should be able to see robots.txt file if you type theÂ following in the address bar of your Internet browser.
HereÂ is Google’s Robots.txt file:
AllÂ search engines follow robots.txt command.
YouÂ can look in your web server log files to see what search engineÂ robots have visited. They all leave signatures that can beÂ detected. These signatures are nothing but name of their robots.Â For instance if Google has spidered your site it will leaveÂ a log file called Googlebot. This is how you know which search engine has spidered your pages and when!
WeÂ are highly experienced in SEO/SEM/Pay Per Click Management. Contact us regarding anyÂ query you may have.