Is 'robots.txt' file really neccessary ?
SEO 2 Comments »
Robots.txt is a small file placed in home directory, that is suppose to tell the crawler bots to
index or not given pages from the site.
Examples:
#robots.txt that tells the bots to go away
User-agent: *
Disallow: /
#robots.txt that prevents google from visiting /secret directory
User-agent: Googlebot
Disallow: /secret/
#robots.txt that allows google to see everything
User-agent: Googlebot
Disallow:
If there is no robots file or there is the empty one all web bots are welcome to index all pages from the site.
So the file is usefull in the case you have duplicated content on some pages
that you do not want to be crawled,blocking these pages would improve overall rank of the site.
Also it works when you have some temporary or 'secret' files you do not want to be indexed.
But beware, this file become the prime interest of vicious people who are trying to find hidden directories on the attacked
site. Using WordPress you don't want to expose the content of say '/wp-admin/' directory, but when you disallow it in robots.txt file , suddenly you give information of presence of this directory, that otherwise would be hard to guess,
as there are no direct links leading to it.
To disallow the robots from the specific page you can use
meta tag placed in the HEAD section of an HTML page
such as
<name="ROBOTS" content="NOINDEX, NOFOLLOW">
Then if you don't want to exclude the whole directory from the robots view, you don't need the file robots.txt .
That was the old technique of positioning based on different pages prepared for different bots, that could be
directed by lines in the robots.txt file, so every webbot would reach the different pages, and don't see the other prepared for other crawlers
but it looks that this technique now is not very effective.
That lead us to the topic question how this file in present form could be usefull nowadays, and is it neccessary to keep this file on the site ?
index or not given pages from the site.
Examples:
#robots.txt that tells the bots to go away
User-agent: *
Disallow: /
#robots.txt that prevents google from visiting /secret directory
User-agent: Googlebot
Disallow: /secret/
#robots.txt that allows google to see everything
User-agent: Googlebot
Disallow:
If there is no robots file or there is the empty one all web bots are welcome to index all pages from the site.
So the file is usefull in the case you have duplicated content on some pages
that you do not want to be crawled,blocking these pages would improve overall rank of the site.
Also it works when you have some temporary or 'secret' files you do not want to be indexed.
But beware, this file become the prime interest of vicious people who are trying to find hidden directories on the attacked
site. Using WordPress you don't want to expose the content of say '/wp-admin/' directory, but when you disallow it in robots.txt file , suddenly you give information of presence of this directory, that otherwise would be hard to guess,
as there are no direct links leading to it.
To disallow the robots from the specific page you can use
meta tag placed in the HEAD section of an HTML page
such as
<name="ROBOTS" content="NOINDEX, NOFOLLOW">
Then if you don't want to exclude the whole directory from the robots view, you don't need the file robots.txt .
That was the old technique of positioning based on different pages prepared for different bots, that could be
directed by lines in the robots.txt file, so every webbot would reach the different pages, and don't see the other prepared for other crawlers
but it looks that this technique now is not very effective.
That lead us to the topic question how this file in present form could be usefull nowadays, and is it neccessary to keep this file on the site ?
Robots.txt file is ABSOLUTELY necessary
, yet from reasons there are not obvious
I am going to write about it soon.
Indeed you need the robots.txt file
read on my blog, what happened to me recently