The World Wide Web contains billions of static web pages. In order to efficiently access web pages of interest, information from the web pages are collected and indexed by search engines and are then able to be retrieved by the users via a set of keywords. The conventional search engine includes one or more web crawler processes that are constantly identifying newly discovered web pages that are growing at a rate of hundreds and thousands of created pages per day. With the volume of data growing exponentially on the web, more and more server instances are being added to provide quick and pertinent search results. For example, Yahoo has set up several Metros and National Yahoos to facilitate its service. And Google established a server farm with more than 15,000 PCs (data from 2003) in multiple clusters distributed world-wide. This imposes serious problems to the internet, the users and the environment.
1. Problems 1.1 Heavy server load and network traffic A search engine gathers information at the providers' site and transfers the documents back to the servers and stores them in a cluster of the free text document database before they generate indexes in the index database. This does not include the "hidden web" documents that are generated dynamically through Relational Database Management Systems (RDBMS). It employs huge storage space, heavy computer labor intensity and causes heavy server load and network traffic on both sides of the search engines as well as the content providers. 1.2 Power consumption and cooling issues Usually there are many spiders crawling the websites in the nearby area before indexing, then consolidates and replicates the data across all the servers around the world so that the users can access the query result from the nearby servers instantly. The cost of this kind of processing is high and involves high redundancy and waste of network and storage resources. Power consumption and cooling issues become significant operational factors. 1.3 Data with low signal to noise ratio Most importantly on the client side, when today's search engines are proud of the ability to index billions of web pages and can provide indexed references back to the users query within one second, the search results typically return thousands or even millions of documents of irrelevant or unwanted material that camouflages the few relevant ones, therefore, they must be filtered out manually.This makes it particularly true that the growing volume of data is dampening the signal to noise ratio, and we are lacking inefficient methods to tackle this critical issue. 1.4 Hampered user online buying experience Moreover, in terms of the user online buying experience, people usually need to look for great deals in a variety of places (Amazon, Craigslist, eBay, etc.) to do the research and comparison before placing an order, the process is way too time consuming and sometimes frustrating. What users expect is a common place to find all the up-to-date listings aggregated from multiple resources, and be able to quickly narrow down to only the most relevant listings instead of sorting through hundreds of potential matches by keyword search. 1.5 Today's Internet like a passive network It requires users to browse manually through mountains of information in search of the content they are interested in, and make repeated efforts to access search engines for what they have searched for previously and lost, or log on to a site daily to update information. Wouldn't it be great if a software program could mine the Internet for information on a user's behalf, storing it in a persistent place, monitoring updates and orchestrating information retrieval, and the user simply logs onto the website, goes to his or her personalized page, and views the requested information? Moreover, the users can rate all of the documents that are useful to them and this information is fed back into the system to help others with similar needs. 1.6 Google's frequently changing SEO algorithms Recently, many businesses found that online marketing can create new channels for business advertisement, especially using Google Adwords and some other pay-per-click tools. It involves tremendous work on the business side to tune the right keywords combination in order to get listed on the first page of Google's search results. It's hard for businesses to manage their marketing budget. In some areas, it's not affordable for most of the local small businesses. It puts an even heavier burden on businesses when Google constantly changes the SEO algorithms and rules for businesses to follow.