Search billions of records on Ancestry.com
   
Spacer Spacer
Spacer

Search Engines and Directories

Part Five
Spacer
Spacer
Introduction
Getting Organized
Tuning the Head
Tuning the Body
The Welcome Mat
PageRank and Links
Dubious Tricks
Submitting Your Site
The List
Other Resources

The Welcome Mat

Now we need to invite the search engines in to do their job. This is a two-part exercise: First, we put out the welcome mat, then we call them up and invite them over. OK, not exactly, but you get the general idea.

Back to the Head

Let's take another look at the example of a web page's head that I used a couple of pages back:

<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" http://www.w3.org/TR/REC-html40/transitional.dtd>
<html>

<head>
<title> Search Engines and Directories - Part Three </title>
<meta name="description" content="Part three of a tutorial on search engines and directories: How to get your pages ready to submit.">
<meta name="keywords" content="search engines,directories,tuning,honing,marketing,ranking,traffic,website promotion,placements,tips,hints,tricks, website,submitting,submissions,how to be seen">
<meta name="robots" content="all">
<meta name="rating" content="general">
<meta name="revisit-after" content="15 days">
</head>

<body ....

Take a look at the line just after the keywords tag in this example. That's right; the one that says: <meta name="robots" content="all">. This line tells search engine robots that it's OK to index this page, and that it's also OK to follow any links from this page to any others. Put this line in any of your pages that you want the search engines to index.

Keep Out!

You say that you have a couple of pages on your site that you don't want indexed? Maybe you're just using them to test out some new ideas (I have a bunch of these). No sweat; just change that little line to: <meta name="robots" content="none">. Most robots are polite; they'll respect your wishes. As to the rest, I've got some strong-arm tactics in the next section of this page.

Robots.txt

You can control the robots' access to your entire site with a single file: the Robots.txt file. This is really easy to do. Use your text editor to create a new text file, and copy the following into it:

User-Agent: *
Disallow: #nothing

Now save this file as robots.txt in the same directory where you keep your index.html file. When you upload your index.html file that you changed a while back, just upload this one with it. This file will now invite all search engine robots to look at everything on your site.

But, you say, I have one directory that I don't want them snooping into. No problem; we'll just add a line to keep them out of that one, so:

User-Agent: *
Disallow: /test/   #and everything in it

Of course you can rename that directory to match whatever you are using. If you need to keep those pesky robots out of more than one directory, just add another line like the one above for each directory that you want.

Now, for those fine folks who want to keep everybody out of their site, here's how:

User-Agent: *
Disallow: /

Simple, isn't it?

Next:

A (slightly more) advanced topic: PageRank. I promise to keep this simple. Well, as much as humanly possible.



Questions or comments about this page? Email the author.


Last Modified: 17 May 2004
Copyright © 2000-2004 James C. Keebaugh
All Rights Reserved
Spacer