Search Engines, especially Google have evolved technologically (amongst other parameters) over the years. The computing power of the software and hardware now deployed by the search giant can better be assessed in terms of the functions it performs and its wide reach.
How Search Engines Work
Broadly speaking, search engines’ functions can be divided into three:
This is the use of special software commonly known as bots, crawlers or spiders to access information on various websites through principally three means:
- Links from other websites already in the search engine’s index or gathered while crawling
- Url’s/links submitted by webmasters
- Sitemaps submitted by webmasters
- Ordinarily one would visualize the bots as some crawling objects moving rapidly all over the web via links to reach different websites in performing its tasks. However, in reality that is not the case. It operates from a particular physical location and is akin to your web browser. It operates by sending various requests to the web servers from which it downloads/fetches various information on new web pages, updated web pages and dead links which are all used to update the search engine’s index.
As web pages are crawled, new links detected on these web pages are added to the search engine’s list of pages to crawl.
In the process of crawling, the search engines encounter challenges in the sense that there is a trade off between minimizing the resources it spends on crawling and maintaining an up to date index. It tries to avoid re-indexing an unchanged web page while it tries to capture all changed web pages in order to keep its index always current.
The search engines stores the pages its crawlers retrieve from various web pages in a massive index database. It sorts this information based on search terms and arranges it in alphabetical order. This sorting enables rapid retrieval of documents from the index when search queries demand them.
It processes the words in the web pages noting the location of the keywords within the pages e.g. title tags, alt attributes. The search engines do process many, but not all content types. As an illustration, it cannot process the content of some rich media files or dynamic pages.
To improve search performance, the search engines ignore (doesn’t index) common words called stop words (such as the, is, on, or, of, how, why, as well as certain single digits and single letters). These words are so common and do little to narrow a search, and therefore can safely be ignored. The indexer also ignores some punctuation and multiple spaces, in addition to converting all letters to lowercase, to improve the search engine’s performance.
- Search Query Processor
This is what most users of the search engines are conversant with and in fact quite often erroneously regard as the “search engine”. It comprises some components with the most visible being the search box or interface through which the search user interacts with the search engine, forwarding his search query for processing.
When a user sends in a query through the interface, the index rapidly retrieves the most relevant documents for the search query. Relevance is determined algorithmically based on many ranking factors numbering over 200.
A key factor amongst these is PageRank which is a measure of the importance of a web page. This is determined by the number and quality of links pointing to the web page. It is however important to stress that not all links are equal as links emanating from high ranked web pages is considered more powerful than links from low ranked web pages.
**** If you enjoyed reading this post, be sure to fill out this form to receive notification via e-mail once any new blog post is published. You will be able to see the post title and if it piques your interest, you can simply click over to my blog.
You can leave a comment below, picking up one or two dofollow backlinks in the process, as the case may be, since this blog is dofollow and has keywordluv and commentluv plugins enabled.
Do not forget to share this post with your friends and followers. Remember sharing is caring! ****