Search agents based information retrieval

Abstract

The ability to semantically link all online information in a given domain, that is, create a semantic web, may lead to the realization of the simultaneous achievement of high precision and high recall. The storage method is semantic hypertext, in which conventional hypertext links are enriched with semantic information. We use emerging standards and tools such as the Extensible Markup Language (XML) and the Resource Description Framework (RDF). To take advantage of the potential speed increase due to scalability, research into algorithms for high performance parallel and grid computing are being explored.

Description

Information retrieval is an issue that touches at the heart of all discovery and research. The ability to semantically link all online information in a given domain, that is, create a semantic web, may lead to the realization of the holy grail of information retrieval, the simultaneous achievement of high precision and high recall. And high speed, with the deployment of large numbers of software scouts capable of cooperatively traversing the semantic web in parallel. During this year in Padova, I will extend some previous research into methods that link chunks of information within and among documents based on semantic relationships and use those connections to efficiently retrieve all the information that closely matches the user?s request. The storage method is semantic hypertext, in which conventional hypertext links are enriched with semantic information that includes the strength and type of the relationship between the chunks of information being linked. Then, a set of cooperating software agents, called scouts, traverse the connections simultaneously searching for requested information. By communicating with each other and a central controller to coordinate the search, the scouts are able to achieve high recall and high precision and perform extremely efficiently.

The quality of the resultant delivered information depends entirely on the quality of the semantic links. Because linking a large number of documents manually would be prohibitively laborious, the construction of these links must be automatic. It would be helpful if, during the creation of electronic documents, semantic information that described the various segments of the documents was inserted. This information would not be displayed to someone reading the document, but would be available to assist in the construction of enriched hypertext links with other similarly enhanced doucments.

Techniques for specifying the semantic relationships that go beyond simple link types and weights will be investigated. The goal here is to give information retrieval agents as much meta-information as possible to enhance their information gathering ability. Such emerging standards and tools as the Extensible Markup Language (XML) and the Resource Description Framework (RDF) are potential useful methods for describing the content and content relationships available within documents for intelligent software agents to use in facilitating knowledge sharing and exchange and creating semantic hypertext links.

Another goal of the current research is to further develop the capability of the link-traversing scouts. As the amount of accessible information reaches staggering proportions, it may be necessary to deploy large numbers of scouts for some applications. High network traffic rates cause communications bottlenecks that can degrade the performance of large numbers of scouts. To take advantage of the potential speed increase due to scalability, research into algorithms for high performance parallel and grid computing are being explored.

Essential Bibliography


Joe Rehder
Last modified: Thu Oct 17 10:46:17 CEST 2002