Web Link Analysis for Information
Retrieval
Abstract
Link analysis techniques are recent techniques used to try to
increase the performance of Web search engines. They all use the
information that can be inferred from the directed graph
representing the Web and their final goal is generally that of
giving a better ranking for the documents retrieved after a
query. We are particularly interested in a mathematical analysis
of these algorithms, in their convergence and general
properties. This kind of general study could also give information
on the possibility of using these algorithms in different fields,
that is in fields far removed from the one of Web ranking. We are
also interested in using these algorithms for the purposes of
personalization and to deduce clustering properties on the Web.
Description
Link analysis techniques are recent techniques used to try to
increase the performance of Web search engines. They all use the
information that can be inferred from the directed graph
representing the Web (i.e. a node for each Web page, and an arc
for each link) and their final goal is generally that of giving a
better ranking for the documents retrieved after a query. It is
possible to consider two classes of link analysis algorithms
for ranking Web pages:
Query-independent algorithms
For this class of algorithms the importance of Web pages is
deduced from the structure of the whole Web graph, before any
retrieval is effected. This information is then used after each
retrieval to rank the Web pages retrieved. The most famous
query-independent algorithm is PageRank, used by the Google search
engine.
Query-dependent algorithms
All the algorithms belonging to this class use a small subgraph of
the Web graph that is obtained after each retrieval, and then try
to infer the importance to the current query of each retrieved Web
page from the structure of this subgraph. The most famous
query-dependent algorithms are Kleinberg's algorithm and SALSA.
We are particularly interested in a mathematical analysis of these
algorithms, in their convergence and general properties. This kind
of general study could also give information on the possibility of
using these algorithms in different fields, that is in fields far
removed from the one of Web ranking. We are also interested in
using these algorithms for the purposes of personalization and to
deduce clustering properties on the Web.
Essential Bibliography
[1] M. Agosti and L. Pretto. A theoretical study of a generalized version of
Kleinberg's HITS algorithm. Information Retrieval, 8:219-243, 2005. Special
topic issue: Advances in Mathematical/Formal Methods in Information Retrieval.
[2] M. Agosti and L. Pretto. Structural properties of Kleinberg's HITS algorithm
(extended abstract). In S. Flesca, S. Greco, D. Saccà and E. Zumpano,
editors, SEBD' 2003 - Proceedings of the Eleventh Italian Symposium on Advanced
Database Systems, pages 449-458, Cosenza, June 2003. Rubbettino.
[3] L. Pretto. A Theoretical Approach to Link Analysis Algorithms.
Ph.D. Thesis, University of Padua, Department of Information Engineering, Dec. 2002.
[4] L. Pretto. A theoretical analysis of Google's PageRank. In A.H.F. Laender
and A.L. Oliveira, editors, String Processing and Information Retrieval,
number 2476 in LNCS, pages 131-144, Berlin, Sept. 2002. Springer.
Luca Pretto
Last modified: Sat May 21 2005