SEOPageRank

Home | About Us | Services | SEO Title | SEO Links | HTML Implementation | Contact Us | Sitemap

 
Our Posts
How to implement Robots.txt file
 
 
 
 
 
 
 
 

How to implement robots.txt file for better crawling?

There is an out of sight, persistent force that permeates the web and its number of web pages and files, unbeknownst to the majority of us attentive beings. We are talking about search engine crawlers and robots here. Daily thousands of them go out and polish the web, whether its search engine trying to index the entire web, or a spam grabbing any email address it could find for less than worthy intentions. As web developers, what little control you have over what robots are permitted to do when they visit your sites exist in a miraculous small file called "robots.txt."

Robots.txt is a text file which has been red by search engines while crawling. This is the hidden file for the users. Search engines read this file for better crawling when it comes to your site. Usage of this file is, to give certain instructions to the robots. You can give command like what should be allowed to crawl and what should not be allowed to crawl. Sometimes you don’t want to fetch some pages by search engines like msn, Google and yahoo.

By defining a few rules in this text file, you can inculcate robots to not crawl and index certain files, folders within your site, or at all. For example, you may not want a search engine to crawl the “images” folder of your website, as it's both worthless to you and a waste of your site's bandwidth. "Robots.txt" lets you tell search engines just that.

"robots.txt" file creation and implementation

Create a regular text file called "robots.txt", and make sure it's named exactly that. This file should be uploaded to the root directory of your website, not a subfolder. It is only by following the above two rules will search engines interpret the instructions contained in the file. Move away from this, and "robots.txt" becomes nothing more than a normal text file.

Now you have learned what to name your text file and where it should be uploaded, you have to learn what to actually put in it to send commands off to search engines that follow this protocol. The structure is trouble-free for most intents and purposes: a USER-AGENT line to recognize the crawler in question followed by one or more DISALLOW: lines to disallow it from crawling certain parts of your website.

1.     Normal "robots.txt":

User-agent: *
Disallow: /

The symbol ‘*’ means “all files” commonly. In the above 2 lines, first line instructs crawler to crawl all files and folders. Second line instructs crawler to crawl nothing on the website.

2) Let us get a little more inequitable now. While every one likes Google, you may not want Google's Image robot crawling your site's images and making them searchable online, if just to save bandwidth. The following code will achieve the technique:

User-agent: Googlebot-images
Disallow: /

3) The following coding disallows all search engines and robots from crawling select directories and pages:

User-agent: *
Disallow: /images/
Disallow: /uploads/file1.html

4) This is for conditionally target multiple robots in "robots.txt" file. Condider following codes below:

User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: /images/
Disallow: /uploads/

Here we have programmed in a tremendous way. Google only read the folders Images and uploads.

Meta tag instructions for robots:

If we don’t have any robot.txt file in root path, then we have to control page access using Meta tags. For smooth access we can instruct through Meta tags on the pages to crawl better.

1.      < meta name=”robots” content=”index”>
This will allow the crawler to index this page.
 

2.      < meta name=”robots” content=”noindex”>
This will not allow the crawler to index this page.
 

3.      < meta name=”robots” content=”follow”>
This will allow the crawler to follow all the links in this page.
 

4.      < meta name=”robots” content=”nofollow”>
This will not allow the crawler to follow all the links in this page.
 

5.      < meta name=”robots” content=”none”>
This will not allow the crawler to neither index nor follow all the links in this page.
 

 

Home | About Us | Services | Free Link Submission | Free Article Submission | Web Templates | SEO Model Sites | Advertisements
  
SEO Titles | Title Tags Implementation | Static Titles Implementation | Dynamic Titles Implementation | Meta Tags Implementation
|
Body Tags Implementation | SEO Links | Static Links | Dynamic Links | Links with Parameters |Inbound Links |Outbound Links
  
HTML Implementations | Usage of HTML tags | Header Tags Implementation | List Tags Implementation | Appearance Tags
  
Contact Us | Sitemap