getButterfly Logo getButterfly

This excerpts is taken from and is a useful resource for Google beginners. I’ll start a large resource center for Google with both beginners and advanced Google optimization technics. Find more at author’s page.

What Is Indexing?

Basically, the process for the Google crawler is to first look at the robots.txt file in order to learn where it shouldn’t go, and then it gets down to business visiting the pages it is allowed to visit. As the crawler lands on a page, it finds the relevant information contained on it, then follows each link and repeats the process.

Robots.txt Explored

Dan proceeded to explain how to use your robots.txt file for excluding pages and directories from your site that you might not want indexed, such as the cgi-bin folder. In terms of what the crawler looks at on the page, he said there are over 200 factors, with “relevance” playing a big part in many of them.

Google Still Loves Its PageRank

Dan also discussed the importance of PageRank (the real one that only Google knows about, not the “for-amusement-purposes-only” toolbar PR that many obsess over). He let us know that having high-quality links is still one of the greatest factors towards being indexed and ranked, and then he proceeded to explain how building your site with unique content for your users is one of the best approaches to take. (Now, where have you heard that before? ;) He explained how creating a community of like-minded individuals that builds up its popularity over time is a perfect way to enhance your site.

Breaking News!

Google is coming out with a new tag called “unavailable_after” which will allow people to tell Google when a particular page will no longer be available for crawling. For instance, if you have a special offer on your site that expires on a particular date, you might want to use the unavailable_after tag to let Google know when to stop indexing it. Or perhaps you write articles that are free for a particular amount of time, but then get moved to a paid-subscription area of your site. Unavailable_after is the tag for you!

Sitemaps Explored

One of the main tools in Webmaster Central is the ability to provide Google with an XML sitemap. Dan told us that a Google sitemap can be used to provide them with URLs that they would otherwise not be able to find because they weren’t linked to from anywhere else. He used the term “walled garden” to describe a set of pages that are linked only to each other but not linked from anywhere else. He said that you could simply submit one of the URLs via your sitemap, and then they’d crawl the rest. He also talked about how sitemaps were good for getting pages indexed that could be reached only via webforms. He did admit later that even though those pages would be likely to be indexed via the sitemap, at this time they would still most likely be considered low quality since they wouldn’t have any PageRank. Google is working on a way to change this in the future, however.

The Dreaded Supplemental Results

These pages generally don’t show up in the search results unless there are not enough relevant pages in the main results to show. He had some good news to report: Google is starting to crawl the supplemental index more often, and soon the distinction between the main index and the supplemental index will be blurring.

Well, actually the info provided here has been requested by a fellow blogger, trying to rank up in Google. I asked him if he did the basic Google stuff, so we have a statring point, and he replied: “Well, I’m really new to this!”.

Subscribe to getButterfly Blog

Once a week or so we send an email with our best content. We never bug you, we just send you our latest piece of content.

If you found any value in this post, agree, disagree, or have anything to add - please do. I use comments as my #1 signal for what to write about. Read our comment policy before commenting! Comments such as "Thank you!", "Awesome!", "You're the man!" are either marked as spam or stripped from URL.

Leave a Reply

Your email address will not be published. Required fields are marked *