[Corpora-List] Open source language analysis package

Lluís Padró padro at lsi.upc.edu
Mon Mar 27 09:00:06 UTC 2006


--- We apologize if you have received multiple copies of this 
announcement. ---

  Dear list members,

  We are pleased to announce the release of FreeLing version 1.3, which 
improves existing functionalities of the suite, and includes new ones, 
such as WN-based semantic annotation, NE classification, and dependency 
parsing. Also, we are glad to announce that this version includes two 
new languages (Italian and Galician) thanks to the researchers willing 
to share their data under open-source or creative-commons licences (see 
"thanks" section in FreeLing web page).
  FreeLing is an open-source C++ library providing language analysis 
services. It is Free Software, released under Gnu LGPL.  FreeLing 1.3 is 
being presented and demonstrated next May at LREC-2006 in Genoa, Italy.

FreeLing is developed at TALP Research Center <http://www.talp.upc.es>, 
in Universitat Politècnica de Catalunya <http://www.upc.es>. 
Morphological dictionaries and grammars were inityally developed by 
Centre de Llenguatge i Computació <http://clic.fil.ub.es>, in 
Universitat de Barcelona <http://www.ub.es>.

  Find more information, an online demo, and download links at 
http://www.lsi.upc.edu/~nlp (under "resources" menu)

--Extra information--

  FreeLing is designed to be used as an external library from any 
application requiring language analysis services. Nevertheless, a simple 
main program is also provided as a basic interface to the library, which 
enables the user to analyze text files from the command line.

The named entity classification module requires some essential Machine 
Learning services such as feature extraction, and 
training/classification using Adaboost models. These services are 
accessible to any program linking the library, so FreeLing can be also 
used as a (very) basic ML-oriented NLP development toolkit.

Features already in previous versions:

    * Text tokenization.
    * Sentence splitting.
    * Morphological analysis.
    * Named entity detection.
    * Date/number/currency/ratios recognition.
    * PoS tagging.
    * Chart-based shallow parsing.

New features in version 1.3

    * New languages: Italian and Galician
    * Improved and debugged linguistic data for Spanish and Catalan.
    * Contraction splitting
    * Improved suffix treatment, retokenization of clitic pronouns.
    * Physical magnitudes detection (speed, weight, temperature,
      density, etc.)
    * Named entity classification.
    * WordNet based sense annotation
    * Dependency parsing.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20060327/9e4b2ea0/attachment.htm>


More information about the Corpora mailing list