------------------------------------------------------
Now playing: Grease by Frankie Valli
------------------------------------------------------
"Robots.txt" files are simple text files that tell the various search engine bots not to index certain files on your site, or possibly the whole site.
A robots.txt file is quite easy to set up. What follows is the code that goes in it and what it means.
User-agent: *
Disallow: /
"User-agent" here means search engine. The wildcard symbol (*) addresses ALL search engine robots that may visit your site. "Disallow" tells them not to index whatever comes after the colon.
In the above example, the forward slash represents anything and everything in the root folder, usually called public_html—in other words, anything that comes after your domain name. For instance, if your site is prettypigpantaloons.com (give me a break, it's hard to come up with interesting examples) and you have a page on your site called privacypolicy.html, its full address is actually http://www.prettypigpantaloons.com/privacypolicy.html. There's a slash after the domain name and before all pages and files that are on your site. So the slash after "disallow" simply means you're saying: "Search engines, don't index anything on my site."
If you only wanted to block one search engine, you would use:
User-agent: Google (or whatever search engine you’re blocking)
Disallow: /
If you'd like your site to be indexed but want to disallow certain folders such as "admin" or "cgi-bin," since these contain behind-the-scenes files that don’t need to be viewed, you would put this in your robots.txt file:
User-agent: *
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /other-folder-you-want-blocked/
Disallow: /page-you-want-blocked.html
Disallow: /photo-you-want-blocked.jpg
Then the search engines will index everything except those items you specifically listed.
If you want to allow access to everything by anyone, you can either use take the slash and everything else off after "Disallow:" or simply don’t create a robots.txt file in the first place.
How to manually create a robots.txt file
Open a plain text editor such as Notepad or TextEdit and type the following:
User-agent: *
Disallow: /files-or-folders-you-want-blocked/
Save it as a text file, name it robots.txt, and upload it to the root directory of your site using your FTP program. (Remember, the public_html folder is the root, meaning it’s the parent of all the files and folders that make up your site.)
If your site is named www.prettypigpantaloons.com, the first thing robots will look for is www.prettypigpantaloons.com/robots.txt, so the robots file has to be put directly into the public_html folder, not in a subfolder. If you were to put it in a subfolder, the search engines wouldn’t be able to find it, because it would actually be located at www.prettypigpantaloons.com/subfolder/robots.txt.
Creating a robots.txt file using Google Webmaster Tools
This method isn't actually any faster, in my opinion, but it's good to know your options. These instructions assume you have already added and verified your site at Google Webmaster Tools.
Click on the name of your site. In the left column, go to Site Configuration > Crawler Access. Click on the Generate robots.txt tab. You can Allow all or Block all in Step 1, or allow some and block some at the same time in Step 2.
If you choose to Allow all, you can still block access to individual items. Choose Block and then All robots in the dropdown boxes, and then add a rule stating which files and folders not to access. List them under Directories and Files, each on its own line, like this:
/admin/
/cgi-bin/
Click on Add Rule to the right, and then in Step 3, download your robots.txt file to your desktop or site file, and then upload it via FTP to the root folder on your server.
If you want to block all bots, just choose Block all in Step 1 and go straight to Step 3.
And that's how you boss the bots.
Similar Posts:
- None Found
I LOVE my “job.” Affiliate marketing takes work, but it doesn’t feel like work.




Comments on this entry are closed.