18.2650, Software: The Clair Library version 1.03 is now available

LINGUIST Network linguist at LINGUISTLIST.ORG
Wed Sep 12 15:50:34 UTC 2007


LINGUIST List: Vol-18-2650. Wed Sep 12 2007. ISSN: 1068 - 4875.

Subject: 18.2650, Software: The Clair Library version 1.03 is now available

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Hannah Morales <hannah at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 11-Sep-2007
From: Bryan Gibson < gibsonb at umich.edu >
Subject: The Clair Library version 1.03 is now available

 

	
-------------------------Message 1 ---------------------------------- 
Date: Wed, 12 Sep 2007 11:46:17
From: Bryan Gibson [gibsonb at umich.edu]
Subject: The Clair Library version 1.03 is now available
E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=18-2650.html&submissionid=155989&topicid=13&msgnumber=1  

Clairlib, The Clair Library

version 1.03 now available

http://belobog.si.umich.edu/clair/clairlib

Introduction

The University of Michigan's CLAIR (Computational Linguistics And Information
Retrieval) group is happy to present version 1.03 of  clairlib, the Clair
library. The Clair library is intended to simplify a number of generic tasks in
Natural Language Processing (NLP), Information Retrieval (IR), and Network
Analysis. Its architecture allows for external software to be plugged in with
very little effort. Two distributions of the Clair library are available:
Clairlib-core, with essential functionality and minimal dependence on external
software, and Clairlib-ext, with extended functionality. Work is underway on
Clairlib-bio and Clairlib-polisci, extensions that will be of interest to people
working on Bioinformatics and Political Science.

Functionality

Native in Clairlib-core: Tokenization, Summarization, LexRank, Biased LexRank,
Document Clustering, Document Indexing, PageRank, Biased PageRank, Web Graph
Analysis, Network Generation, Power Law Distribution Analysis, Network Analysis
(clustering coefficient, degree distribution plotting, average shortest path,
diameter, triangles, shortest path matrices, connected components), Cosine
Similarity, Random Walks on Graphs, Statistics (distributions, tests), Tf, Idf,
Community Finding*, Phrase-Based Queries*, Fuzzy OR Queries*

Imported functionality into Clairlib-core: Stemming, Sentence Segmentation, Web
Page Download, Web Crawling, XML Parsing, XML Tree Building, XML Writing

Clairlib-ext features: Sentence Segmentation using MxTerminator, Sentence
Parsing using the Charniak Parser and Chunklink 

*New and expanded functionality available this latest release

Changes

1.03 August 2007
*Added functionality to perform community finding within weighted, undirected
networks
*Added util/chunk_document.pl - breaks documents into smaller files by word number
*Added option to retain punctuation for idf and tf queries 
*Added option to print out full lists of idf and tf values for corpus
*LexRank moved from Clair::Network to Clair::Network::Centrality::LexRank
*LexRank use now follows same use pattern as the other centrality modules

1.02 July 2007
*Distribution reorganized in standard format
*Updated overall documentation

1.01 May 2007
*Added Phrase-based Retrieval and Fuzzy OR Queries
*Extended Clairlib-ext with interfaces for Cluster class and Document class to
Weka machine learning toolkit
*Added LSI functionality
*Extended parsing of strings/files into Documents
*Added perceptron learning and classification

Download

Visit http://belobog.si.umich.edu/clair/clairlib/ or write to radev at umich.edu to
get a copy.  Write to radev at umich.edu to receive beta versions of Clairlib-bio
or Clairlib-polisci.

Funding

This work has been supported in part by National Institutes of Health grants R01
LM008106 "Representing and Acquiring Knowledge of Genome Regulation" and U54
DA021519 "National center for integrative bioinformatics", as well as by grants
IDM 0329043 "Probabilistic and link-based Methods for Exploiting Very Large
Textual Repositories,"  0534323 "Collaborative Research: BlogoCenter -
Infrastructure for Collecting, Mining and Accessing Blogs," and 0527513 "The
Dynamics of Political Representation and Political Rhetoric," from the National
Science Foundation.

About

The Clair Library is developed by the Clair group at the University of Michigan.

Project design: Dragomir R. Radev

Main implementers: Jonathan dePeri, Anthony Fader, Joshua Gerrish, Bryan Gibson,
Mark Hodges, Mark Joseph, Dragomir Radev, and Mark Schaller

Additional code by: Timothy Allison, Michael Dagitses, Aaron Elkiss, Gunes
Erkan, Scott Gifford, Patrick Jordan, Samuela Pollack, and Adam Winkel 

Linguistic Field(s): Computational Linguistics






-----------------------------------------------------------
LINGUIST List: Vol-18-2650	

	



More information about the LINGUIST mailing list