[Corpora-List] CLAIRLIB Release
mtjoseph at umich.edu
mtjoseph at umich.edu
Wed Oct 18 21:51:05 UTC 2006
Clairlib, The Clair Library
is now available
http://tangra.si.umich.edu/clair/clairlib
INTRODUCTION
The University of Michigan's CLAIR (Computational Linguistics And
Information Retrieval) group (http://tangra.si.umich.edu/clair) is
happy to present the second release of clairlib, the Clair library.
The Clair library is written in Perl and is intended to simplify a
number of generic tasks in Natural Language Processing (NLP),
Information Retrieval (IR), and Lexical Network Analysis. Its
architecture also allows for external software to be plugged in with
very little effort.
Clairlib features a tiered architecture with a core shared by all
applications and subject-specific libraries (currently in political
science and bioinformatics).
FUNCTIONALITY
Native: Tokenization, Summarization, LexRank, Biased LexRank, Document
Clustering, Document Indexing, PageRank, Biased Pagerank, Web Graph
Analysis, Bioinformatics Text Analysis, Political Science Text
Analysis, Network Building, Power Law Distribution Analysis, Network
Analysis and Computation (Watts-Strogatz Clustering Coefficient,
Cosines, Random Walks), Tf, Idf
Imported: Stemming, Sentence Segmentation, Web Page Download, Web
Crawling, XML Parsing, XML Tree Building, XML Writing
FUNDING
This work has been supported in part by grants R01 LM008106
"Representing and Acquiring Knowledge of Genome Regulation" and U54
DA021519 "National center for integrative bioinformatics", both from
the National Institutes of Health as well as grants IDM 0329043
"Probabilistic and link-based Methods for Exploiting Very Large Textual
Repositories" and DHB 0527513 "The Dynamics of Politcal Representation
and Political Rhetoric," both from the National Science Foundation.
ABOUT
The Clair Library is developed by the Clair group at the University of
Michigan. It encompasses the functionality of MEAD and perltree, two
of CLAIR's earlier releases.
Project design: Dragomir R. Radev
Main implementers: Anthony Fader, Mark Hodges, and Dragomir R. Radev
Additional code by: Timothy Allison, Michael Dagitses, Aaron Elkiss,
Gunes Erkan, Scott Gifford, Mark Joseph, Samuela Pollack, and Adam
Winkel
More information about the Corpora
mailing list