18.1167, Software: CLAIRLIB 1.0 Release

LINGUIST Network linguist at LINGUISTLIST.ORG
Tue Apr 17 15:20:15 UTC 2007


LINGUIST List: Vol-18-1167. Tue Apr 17 2007. ISSN: 1068 - 4875.

Subject: 18.1167, Software: CLAIRLIB 1.0 Release

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Laura Welcher, Rosetta Project  
       <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Hannah Morales <hannah at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 17-Apr-2007
From: Mark Joseph < mtjoseph at umich.edu >
Subject: CLAIRLIB 1.0 Release

 

	
-------------------------Message 1 ---------------------------------- 
Date: Tue, 17 Apr 2007 11:18:50
From: Mark Joseph < mtjoseph at umich.edu >
Subject: CLAIRLIB 1.0 Release 
 


Clairlib, The Clair Library

version 1.0 is now available

http://tangra.si.umich.edu/clair/clairlib

Introduction

The University of Michigan's CLAIR (Computational Linguistics And
Information Retrieval) group is happy to present version 1.0 of 
clairlib, the Clair library. 

The Clair library is intended to simplify a number of generic tasks in
Natural Language Processing (NLP), Information Retrieval (IR), and
Network Analysis. Its architecture also allows for external software
to be plugged in with very little effort.

Two distributions of the Clair library are available: Clairlib-core, 
with essential functionality and minimal dependence on external 
software, and Clairlib-ext, with extended functionality that may be 
of interest to a smaller audience.  Work is underway on Clairlib-bio 
and Clairlib-polisci, extensions that will be of interest to people 
working on Bioinformatics and Political Science.

Functionality

Native in Clairlib-core: Tokenization, Summarization, LexRank, 
Biased LexRank, Document Clustering, Document Indexing, PageRank, 
Biased PageRank, Web Graph Analysis, Network Generation*, Power 
Law Distribution Analysis*, Network Analysis* (clustering 
coefficient, degree distribution plotting, average shortest path, 
diameter, triangles, shortest path matrices, connected components), 
Cosine Similarity*, Random Walks on Graphs*, Statistics* 
(distributions, tests), Tf, Idf

Imported functionality into Clairlib-core: Stemming, Sentence 
Segmentation, Web Page Download, Web Crawling, XML Parsing*, 
XML Tree Building*, XML Writing* 

Clairlib-ext features: Sentence Segmentation using MxTerminator, 
Sentence Parsing using the Charniak Parser and Chunklink

* New and expanded functionality available for the first time in this
latest release.

Download

Visit http://tangra.si.umich.edu/clair/clairlib/ or write to 
radev at umich.edu to get a copy.  Researchers doing work on 
Bioinformatics or Political Science can write to 
radev at umich.edu to receive beta versions of Clairlib-bio or
Clairlib-polisci.


Funding

This work has been supported in part by National Institutes of Health 
grants R01 LM008106 'Representing and Acquiring Knowledge of Genome 
Regulation' and U54 DA021519 'National center for integrative 
bioinformatics', as well as by grants IDM 0329043 'Probabilistic and 
link-based Methods for Exploiting Very Large Textual Repositories,' 
0534323 'Collaborative Research: BlogoCenter - Infrastructure 
for Collecting, Mining and Accessing Blogs,' and DHB 0527513 'The 
Dynamics of Political Representation and Political Rhetoric,' from 
the National Science Foundation.

About

The Clair Library is developed by the Clair group at the University 
of Michigan.

Project design: Dragomir R. Radev

Main implementers: Anthony Fader, Joshua Gerrish, Mark Hodges,
Dragomir Radev, and Mark Schaller

Additional code by: Timothy Allison, Michael Dagitses, Aaron Elkiss,
Gunes Erkan, Scott Gifford, Patrick Jordan, Mark Joseph, Samuela 
Pollack, and Adam Winkel 
Linguistic Field(s): Computational Linguistics





-----------------------------------------------------------
LINGUIST List: Vol-18-1167	

	



More information about the LINGUIST mailing list