What the spider finds it then sends to the second part of
the search engine called the index, other times called the
catalog. The index is like a large book containing a copy of
every web page that the spider finds. It a web page changes,
then the index is then updated with the new data. One point of
interest to note is that it may take a while for new pages or
updates that a crawler finds to be added to the index. So while
a web page may have been 'spidered' but still not yet indexed
and therefore not available to those searching the web with a
search engine.
The third part of the search engine is the software. This is
the program that sorts out the millions of pages that were
reported to the index by the spider and then matches them to a
search and ranks them in order of what it believes to be the
most relevant criteria.
So now we know the parts of a web crawler search engine and
how it works. Some of the well known crawlers are AllTheWeb,
AltaVista, Google, Inktomi and Teoma. Each of these has its own
proprietary algorithm and functions that it does. But basically
most web spiders function the same way.
Search engines can be your best friends. They can mean the
difference between learning about the way of the cosmos and
what Brangelina did for lunch the other day. No matter what you
want to know or who you want to know it about, you can learn it
all with a little help from a search engine or two. They are
simple to use and they are free, what more can you ask for
really?
|