pt.tumba.links
Class HITS

java.lang.Object
  extended by pt.tumba.links.HITS
Direct Known Subclasses:
Companion

public class HITS
extends java.lang.Object

Kleinberg's hypertext-induced topic selection (HITS) algorithm is a very popular and effective algorithm to rank documents based on the link information among a set of documents. The algorithm presumes that a good hub is a document that points to many others, and a good authority is a document that many documents point to. Hubs and authorities exhibit a mutually reinforcing relationship: a better hub points to many good authorities, and a better authority is pointed to by many good hubs. Because the HITS algorithm ranks documents only depending on the in-degree and out-degree of links, it will cause problems in some cases. For example, Improved Algorithms for Topic Distillation in a Hyperlinked Environment presents two problems: mutually reinforcing relationships between hosts and topic drift. Both can be solved or alleviated by adding weights to documents. The first problem can be solved by giving the documents from the same host much less weight, and the second problem can be alleviated by adding weights to edges based on text in the documents or their anchors.

Author:
Bruno Martins

Field Summary
private  java.util.Map authorityScores
          A Map containing the Authority score for each page
private  WebGraph graph
          The data structure containing the Web linkage graph
private  java.util.Map hubScores
          A Map containing the Hub score for each page
 
Constructor Summary
HITS(WebGraph graph)
          Constructor for HITS
 
Method Summary
private  java.lang.Double authorityScore(java.lang.Integer id)
          Returns the Authority score value associated with a given link identifyer.
 java.lang.Double authorityScore(java.lang.String link)
          Returns the Authority score associated with a given link
 void computeHITS()
          Computes the Hub and Authority scores for all the nodes in the Web Graph.
 void computeHITS(int numIterations)
          Computes the Hub and Authority scores for all the nodes in the Web Graph.
private  java.lang.Double hubScore(java.lang.Integer id)
          Returns the Hub score value associated with a given link identifyer.
 java.lang.Double hubScore(java.lang.String link)
          Returns the Hub score associated with a given link
 void initializeAuthorityScore(java.lang.Integer id, double value)
          Initializes Authority score associated with a given link identifyer.
 void initializeAuthorityScore(java.lang.String link, double value)
          Initializes the Authority score associated with a given link.
 void initializeHubScore(java.lang.Integer id, double value)
          Initializes Hub score associated with a given link identifyer.
 void initializeHubScore(java.lang.String link, double value)
          Initializes the Hub score associated with a given link.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

graph

private WebGraph graph
The data structure containing the Web linkage graph


hubScores

private java.util.Map hubScores
A Map containing the Hub score for each page


authorityScores

private java.util.Map authorityScores
A Map containing the Authority score for each page

Constructor Detail

HITS

public HITS(WebGraph graph)
Constructor for HITS

Parameters:
graph - The data structure containing the Web linkage graph
Method Detail

computeHITS

public void computeHITS()
Computes the Hub and Authority scores for all the nodes in the Web Graph. In this method, the maximum number of iterations of the algorithm is set to 25


computeHITS

public void computeHITS(int numIterations)
Computes the Hub and Authority scores for all the nodes in the Web Graph. Given a Web graph, an iterative calculation is performed on the value of authority and value of hub. For each page p, the authority value of page p is the sum of hub scores of all the pages that points to p, the hub value of page p is the sum of authority scores of all the pages that p points to. Iteration proceeded on the neighborhood graph until the values converged.

Parameters:
iter - The maximum number of iterations for the algorithm

hubScore

public java.lang.Double hubScore(java.lang.String link)
Returns the Hub score associated with a given link

Parameters:
link - The url for the link
Returns:
The Hub score associated with the given link

hubScore

private java.lang.Double hubScore(java.lang.Integer id)
Returns the Hub score value associated with a given link identifyer. Identifyers are Integer numberes, used in WebGraph to represent the Web graph for efficiency reasons.

Parameters:
link - An identifyer for the link
Returns:
The Hub score associated with the given link
See Also:
WebGraph.IdentifyerToURL()

initializeHubScore

public void initializeHubScore(java.lang.String link,
                               double value)
Initializes the Hub score associated with a given link.

Parameters:
link - The url for the link
value - The Hub score to assign

initializeHubScore

public void initializeHubScore(java.lang.Integer id,
                               double value)
Initializes Hub score associated with a given link identifyer. Identifyers are Integer numberes, used in WebGraph to represent the Web graph for efficiency reasons.

Parameters:
link - An identifyer for the link
value - The Hub score to assign
See Also:
WebGraph.IdentifyerToURL()

authorityScore

public java.lang.Double authorityScore(java.lang.String link)
Returns the Authority score associated with a given link

Parameters:
link - The url for the link
Returns:
The Authority score associated with the given link

authorityScore

private java.lang.Double authorityScore(java.lang.Integer id)
Returns the Authority score value associated with a given link identifyer. Identifyers are Integer numberes, used in WebGraph to represent the Web graph for efficiency reasons.

Parameters:
link - An identifyer for the link
Returns:
The Authority score associated with the given link
See Also:
WebGraph.IdentifyerToURL()

initializeAuthorityScore

public void initializeAuthorityScore(java.lang.String link,
                                     double value)
Initializes the Authority score associated with a given link.

Parameters:
link - The url for the link
value - The Authority score to assign

initializeAuthorityScore

public void initializeAuthorityScore(java.lang.Integer id,
                                     double value)
Initializes Authority score associated with a given link identifyer. Identifyers are Integer numberes, used in WebGraph to represent the Web graph for efficiency reasons.

Parameters:
link - An identifyer for the link
value - The Authority score to assign
See Also:
WebGraph.IdentifyerToURL()