What is WebLA?
WebLA is a Java package for handling Web Graphs, implementing popular algorithms such as PageRank, HITS, CoCitation Similarity and SimRank. It is of particular interest for research in Information Retrieval, since it provides a set of APIs (Application Programming Interfaces) that allow one to easily experiment with such algorithms.
Efficient methods for handling relatively large Web graphs in memory are implemented, and the package also includes many utility methods, such as detecting non-nepotistic links, extracting linkage information from a relational database, etc.
The software is employed at the tumba! Portuguese Web search engine, where it is used as the basis of a "link-based" related pages algorithm, as well as in other web mining tools.
WebLA was written by Bruno Martins.
Documents on the Web contain links to other pages. By analyzing this data it is possible to improve the results of typical ranking and classification algorithms.
Notable early successes in this area include the PageRank algorithm, which globally analyzes the entire Web graph to determine a global notion of page quality, and the HITS algorithm, which analyzes a local neighborhood of the Web graph to find "hub" and "authority" pages. More recent proposals include algorithms to measure document similarity using this linkage information, such as direct applications of concepts from the area of Bibliometrics (co-citation, bibliographic coupling) or SimRank (a variation of PageRank).
WebLA is a Java package for handling Web Graphs, implementing some of these popular algorithms.
WebLA is released under the BSD License, which basically states that you can do anything you like with it as long as you mention the authors and make it clear that the library is covered by the BSD License. It also exempts us from any liability, should this library eat your hard disc or kill your cat.
The package is relatively simple install and run. We encourage you to try it out and let us know of any problems you find. We would also be very happy to hear from people who are using this software package.