Standard Web search services are quite useful in their own right, but are
far from ideal. Search engines retrieve web pages which contain information
relative to the subject which the user queries. Meta search unlike standard
search, utilizes many dierent search systems to provide results. The Meta
Search Engine (MSE) is a system that enables a meta search.
Meta search engines are Web services that receive user queries and dispatch
them to multiple crawl-based search engines (also called component
engines). Once all the results are returned from the component search engines,
they are merged into a single ranked list. The advantage of MSEs are
their ability to combine coverage across multiple search engines thus allowing
them to reach the Deep Web. MSEs therefore work at a much higher level
than the standard web search engine.
There are two distinct steps in creating a MSE. Firstly searching through
the deep web to collect results and secondly to both merge and rank those
results. Thus the algorithm used in the ranking and merging process is
critical because it has a direct impact on the eectiveness of the MSE. This
report will focus on discussing dierent strategies to integrate the results
received from multiple search engines for use in a MSE.
2 Meta Search Engine System
In order for us better understand result integration strategies, we need to take
a look the science involved in creating a MSE. These can also be considered
strategies which help the MSE return better search results. There are three
components to take into consideration when building a MSE these include
search engine selection,result extraction and result integration.
2.1 Search Engine Selection
In order for the MSE to chose which search engine to use the MSE rst collects
the representative of its component search engines. The representatives
of the components search engines are stored in advance inside the MSE. is
stored in advance. The representative is a form of information that can represent
the contents of each documents within the components search engines.
This enables search engines to be ranked accordingly and it is chosen based
on that rank. This adds to the weighted value of each search result of the
MSE and can allows for a better ranking system.
2.2 Result Extraction
A result page returned by a search engine is a dynamically generated HTML
page. Embedded inside the HTML document are the search result records.
The search result record (SRR) matches the document with the URL,title
and also a short description of the content. The MSE analyses the source
code of the HTML document using text strings or tag trees to grab this
information leaving the unnecessary information behind.
2.3 Result Merging
Result integration is the process by which the MSE looks to combine results
from various search engines and return them as one singly ordered list. This
is considered to be the biggest obstacle when creating a MSE that produces