The Information Management Systems research group

Web Link Analysis for Information Retrieval

Abstract

Link analysis techniques are recent techniques used to try to increase the performance of Web search engines. They all use the information that can be inferred from the directed graph representing the Web and their final goal is generally that of giving a better ranking for the documents retrieved after a query. We are particularly interested in a mathematical analysis of these algorithms, in their convergence and general properties. This kind of general study could also give information on the possibility of using these algorithms in different fields, that is in fields far removed from the one of Web ranking. We are also interested in using these algorithms for the purposes of personalization and to deduce clustering properties on the Web.

Description

Link analysis techniques are recent techniques used to try to increase the performance of Web search engines. They all use the information that can be inferred from the directed graph representing the Web (i.e. a node for each Web page, and an arc for each link) and their final goal is generally that of giving a better ranking for the documents retrieved after a query. It is possible to consider two classes of link analysis algorithms for ranking Web pages:

Query-independent algorithms

For this class of algorithms the importance of Web pages is deduced from the structure of the whole Web graph, before any retrieval is effected. This information is then used after each retrieval to rank the Web pages retrieved. The most famous query-independent algorithm is PageRank, used by the Google search engine.

Query-dependent algorithms

All the algorithms belonging to this class use a small subgraph of the Web graph that is obtained after each retrieval, and then try to infer the importance to the current query of each retrieved Web page from the structure of this subgraph. The most famous query-dependent algorithms are Kleinberg's algorithm and SALSA.

We are particularly interested in a mathematical analysis of these algorithms, in their convergence and general properties. This kind of general study could also give information on the possibility of using these algorithms in different fields, that is in fields far removed from the one of Web ranking. We are also interested in using these algorithms for the purposes of personalization and to deduce clustering properties on the Web.

Essential Bibliography

[1] M. Agosti and L. Pretto. A theoretical study of a generalized version of Kleinberg's HITS algorithm. Information Retrieval, 8:219-243, 2005. Special topic issue: Advances in Mathematical/Formal Methods in Information Retrieval.

[2] M. Agosti and L. Pretto. Structural properties of Kleinberg's HITS algorithm (extended abstract). In S. Flesca, S. Greco, D. Saccà and E. Zumpano, editors, SEBD' 2003 - Proceedings of the Eleventh Italian Symposium on Advanced Database Systems, pages 449-458, Cosenza, June 2003. Rubbettino.

[3] L. Pretto. A Theoretical Approach to Link Analysis Algorithms. Ph.D. Thesis, University of Padua, Department of Information Engineering, Dec. 2002.

[4] L. Pretto. A theoretical analysis of Google's PageRank. In A.H.F. Laender and A.L. Oliveira, editors, String Processing and Information Retrieval, number 2476 in LNCS, pages 131-144, Berlin, Sept. 2002. Springer.

Luca Pretto

Last modified: Sat May 21 2005