What are the available solutions for the search engine problems?
2. Existing solutions
People made tremendous effort on adding a layer of meaning on top of the existing web. The typical applications below accomplished the goal to a certain extent. But none of these techniques generate significant improvements that help to present more relevant information toward user requests.
2.1 Google PageRank
Google created a mathematical method based on probability analysis of the page links to rank the importance of each web page. The rank is determined by the extrinsic relationships between the web pages because they believe intuitively that a web page should be important (regardless of content) if it is highly linked by other web pages, and it's susceptible to spamming technique to artificially increase the page relevance. However, most dynamically generated product detail pages are on the leaf node but contain comprehensive information so that it can be ranked high.
Local.com invented a method to rank web page relevancy based on geographically indexing information by assigning geocoded and geocodable web pages that provide local businesses information per user supplied keywords. It is assumed that the method they provided to organize data according to geographical area can improve the relevancy of the search result. It is true to a certain extent, but with some exceptions. For instance, many online merchandise information is available across the web with no physical location constraints.
2.3 Classified search
On the other hand, classified sites play an important role in the online market, with some accepting user input. But many classified sites, such as Yellowpages, Yellowbook, Manta, Yelp, and Yext, only provide business listings rather than product and service listings that targets the user's needs directly. Among the number of growing sites, there are some that provide specialized classified search capability in the vertical market, catering to niche market products and services, such as jobs, pets, rentals, books, blogs, forums, games,etc. A number of other online services called aggregators crawl and aggregate classifieds from sources such as blogs and RSS feeds, as opposed to relying on manually submitted listings. A few of such web sites that provide horizontal search but with very limited categories include Craigslist, Oodle and ClassifiedAds, just to name a few.
2.4 Google Plus Local
Google joined the classified search market by creating product catalogs and advocating businesses to post products and services to the Google Merchant Center. They also provide business listings to Google Plus Local. However, there is a huge barrier for the business providers to transfer their product database and daily updates over to the search engine site and also causes heavy workload involving intensive human effort and computational work.
2.5 Human-Powered search engine
Human-powered search engines, better known as Web directories, are popular simply because of the higher quality of links submitted and the caliber of the sites hand-picked to be included in the index. However, due to the resource constraints, it can only cover a small portion of the searchable information on the web.
It's worth mentioning that there are more category-based eCommerce sites and knowledge-based sites emerging that provide users with more structured data for searching products, services as well as information and knowledge. For instance, Amazon, eBay, Quora, Pose, etc. However, they are not search engines that are supposed to index information in general or across a certain niche.
2.6 Distributed search engine
Also called a peer-to-peer search engine, according to Wikipedia, "A distributed search engine is a search engine where there is no central server. Unlike traditional centralized search engines, work such as crawling, data mining, indexing, and query processing is distributed among several peers in a decentralized manner where there is no single point of control." The P2P-based search engine vendors believe that they can get millions of people to lend them their computers and privacy for the purpose of searching the Web by running a peer instance on their personal computers. It is also doubted to be fast enough and with the firewall issues.
2.7 Private search engine
2.8 New technologies
Among all the potential solutions to resolve the existing problem, two major groups of technologies have evolved and developed over the years to meet the challenge. Both Semantic Web and Agent Technology share the same root from the Knowledge Sharing Effort initiative projects sponsored by the Advanced Research Projects Agency (ARPA, later DARPA) within the U.S. Department of Defense for use by its projects at universities and research laboratories in the US. They defined the initial concepts of how distributed heterogeneous agent can work together in a collaborative way and how Semantic Web is formed.
Agent Technology generalizes the client-server architecture and becomes an important technology for distributed systems. It promises to deliver a personalized service for individual needs by creating various types of software agents that roam the internet to collect information and perform the tasks on the user's behalf with various types of the agent framework and common Agent Communication Language (ACL). IBM was also working on combining Java Applet technology with agents into what it calls "aglets". Unfortunately, the deployment of aglets is facing major challenges, such as the security issue for mobile agents.
The Semantic Web stack builds on the W3C's Resource Description Framework (RDF) and provides a way to extend the network of hyperlinked human-readable web pages by inserting machine-readable meta data about pages and how they are related to each other. However, after it was originally proposed by Berners-Lee since year 2001 to present, this has yet to happen as there are many critics concerning its feasibility and places a heavy burden on web masters, blog authors and web publishers to embed additional machine-readable meta data into billions of web pages that are designed to be read by people. This makes it more time-consuming to create and publish content along with the possibility of misleading of the meta tags.