Federated search portals are becoming increasingly popular in business. Businesses are beginning to harness the power of their organisational data by employing federated search techniques to mine multiple data stores.
Federated search is a way of concurrently searching multiple data stores, giving end-users a single interface portal to find data in files, content systems, internal applications and public knowledge bases. Users can enter a single search query into the federated search portal and retrieve results from multiple stores easily and quickly. Federated search uses algorithms tuned to sort results by relevancy. Some of the first industries to adopt federated search were publishing, medicine and law, who rely on these tools as a critical component of their business. Today, federated search has become a common tool, assisting businesses to remain competitive.
Federated search tools are now used as a key component of public and subscription information-based products widely used by the medical, law and publishing industries as well as government and finance. New data stores can be added to the federated search as the business information grows and changes. The federated search engine remains scalable by implementing a connector layer that sits between the user’s search query and the data stores. The connector layer interprets the user query into the format needed by the data store. After the search engine has received the results from all of the data stores, it applies a relevancy algorithm to the results and displays a single, ranked list of results to the user.
The following diagram demonstrates the federated search:

Differentiating federated search from Google and other web search engines
Federated search engines differ from web search engines such as Google1 in a number of ways:
- Access to Content - web search engines do not have access to high-quality information that exists in secure knowledge bases. These data stores need to be accessed by federated search technologies. This is also true for businesses seeking a portal to their internal applications.
- Speed of searching - Web search engines use a technique called ‘crawling’ to search for relevant surface information that is readily available in the public domain. This information can be retrieved more quickly than using a federated search as the data is superficial and may or may not be relevant. The performance of federated search engines is dependent on the underlying data stores and their ability to perform. There are performance-tuning strategies available to tune the federated search engine.
- Relevancy of content - Content retrieved from web search engines may not be relevant, as the web engine only crawls surface data. Depending on when a page was last crawled, the results may be a week, or a month out of date. Federated search engines use their own relevancy search algorithms that ensure that results are meaningful and relevant. Searches are done in real-time, so searches will always return current information.
- Merging of and ranking content - Federated search engines and web search engines rank results based on their own sorting algorithms. Additionally, federated search engines can be configured to merge and remove duplicates during the ranking process.2
Moxy thought leadership
Moxy Knowledge Management has been exploring how the use of federated search will allow organisations to retrieve data from all of their internal applications. Moxy uses its own content system that allows the team to retrieve ranked search results consisting of a different file types from a single search. In order to retrieve and rank content with different metadata and file encoding, the system applies a common conversion and indexing method, and then applies its ranking algorithm. A current project is underway at Moxy to expand this search to include multiple data sources. The Moxy team are also evaluating open source products available to assist federated search solutions. Products being investigated include Pazpar2 and Zebra from Index Data, DbWiz by the Simon Fraser University, LibraryFind by the Oregon State University Library.
Keep an eye on the website for updates on this and other projects going on at Moxy.
Sources:
1. Google. "WebMaster Tools". Google Basics: Indexing, 2009.http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=70897
2. Lederman, Sol. "Crawling vs Deep Web Searching?".Deep Web Technologies: Federated Search Blog, December 17, 2007. http://federatedsearchblog.com/2007/12/17/crawling-vs-deep-web-searching/ |