19.3048, Software: Computational Ling/Text&Corpus Ling/Software for automatic text..

LINGUIST Network linguist at LINGUISTLIST.ORG
Wed Oct 8 15:52:28 UTC 2008


LINGUIST List: Vol-19-3048. Wed Oct 08 2008. ISSN: 1068 - 4875.

Subject: 19.3048, Software: Computational Ling/Text&Corpus Ling/Software for automatic text..

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Randall Eggert, U of Utah  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, 
and donations from subscribers and publishers.

Editor for this issue: Susanne Vejdemo <susanne at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.

===========================Directory==============================  

1)
Date: 08-Oct-2008
From: Slava Yatsko < iatsko at gmail.com >
Subject: Software for automatic text processing

 

	
-------------------------Message 1 ---------------------------------- 
Date: Wed, 08 Oct 2008 11:51:08
From: Slava Yatsko [iatsko at gmail.com]
Subject: Software for automatic text processing

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=19-3048.html&submissionid=192926&topicid=13&msgnumber=1
  


Dear Colleagues,
The Computational Linguistics Laboratory at Katanov State University of
Khakasia (CLL at KSU) is pleased to announce the release of Linguistic
Toolbox - a package of programs for automatic text processing.  Linguistic
Toolbox is a concordance that differs from existing analogues in the
following respects.
 - It has an integrated part-of-speech tagger thus allowing the user to
create his/her own annotated corpora. Profound linguistic research is often
based on a specific text genre (e.g. fiction, scientific text), linguistic
category (e.g. possession), or works of a particular author (e.g. Maugham).
Publicly available annotated national corpora with evenly distributed
genres often fail to meet the demands of such research and LIT has been
designed to fill this gap. By means of LIT the user can conduct various
searches on his/her own corpora and get statistical information on
distribution of various words, patterns, and phrases. 
- Union, subtraction, and intersection operations. These operations are
used in the theory of sets to construct new sets from existing ones. Why
not perform these operations on texts, so that to construct new texts from
existing ones? For example using the subtraction operation the user can
subtract stopwords from a text, and using the intersection operation he/she
can get a list of words that occur in two or more texts with raw counts
assigned to each word. These functions may be of use for computing
distances between texts for the purposes of text classification and
categorization.
- LIT has an integrated spreadsheet. Having obtained by means of LIT some
statistical information the user can perform computations in LIT itself
without consulting some commercially distributed products such as MS Excel.
- LIT has an integrated WordNet module by means of which the user can
search not only for a given word but also for words semantically related to it.

LIT is distributed as freeware and can be downloaded from the CLL's site at
http://www.cll.khsu.ru/cll/products.aspx?productid=5 
The current version supports English and works on Windows machines. 

V.Yatsko, Head of  the CLL at KSU 
Linguistic Field(s): Computational Linguistics
                     Text/Corpus Linguistics






-----------------------------------------------------------
LINGUIST List: Vol-19-3048	

	



More information about the LINGUIST mailing list