[Corpora-List] Talismane : robust dependency parser for French

Assaf Urieli assaf.urieli at univ-tlse2.fr
Wed Feb 19 08:54:48 UTC 2014


*Talismane : robust dependency parser for French*
*******************************************************
Talismane as a complete open-source linguistic toolkit (sentence detector,
tokeniser, pos-tagger and syntax parser), with an off-the-shelf
implementation for French.

Talismane may be downloaded at:
http://redac.univ-tlse2.fr/applications/talismane.html

It was developed by Assaf Urieli within in the framework of his thesis in
the NLP group of the CLLE-ERSS laboratory (UMR 5263), Université de
Toulouse II le Mirail.

   - Thesis :
   http://w3.erss.univ-tlse2.fr/textes/pagespersos/urieli/URIELI-thesis-2013.pdf
   - CLLE-ERSS laboratory : http://w3.erss.univ-tlse2.fr/

It is an open source toolkit written in Java, and distributed under the
Affero GPL v3 licence: http://www.gnu.org/licenses/agpl-3.0.html

Talismane performs, for French:

   - sentence boundary detection;
   - tokenisation;
   - pos-tagging and lemmatisation;
   - transition-based dependency parsing.

Talismane was trained on the French Treebank (Abeillé et al, 2003) for
sentence boundary detection, tokenisation and pos-tagging, and on the
French Treebank converted to dependencies (Candito et al, 2010) for
parsing. It uses the LeFFF as the default lexicon (Sagot 2010). It is fully
configurable (machine learning parameters, features, rules, tagset,
lexicon, etc.) and can be trained for other languages with dependency
treebanks available.

Furthermore, Talismane makes it possible to:

   - analyse quickly (measured at 2 million words an hour in the base
   configuration);
   - parse XML and HTML simply, by adding filters indicating which parts
   should be analysed;
   - add rules to override the statistical model's decisions during
   analysis, imposing or prohibiting a local decision based on the context;
   - choose between a better-quality analysis (wide beam) and faster
   analysis (narrow beam);
   - propagate ambiguity from one level of analysis to the next (e.g.
   pos-tagging to parsing), to allow a higher level module correct the errors
   of a lower-level module;
   - keep a trace of the exact position of each analysed token in the
   original file;
   - output system confidence in each decision taken.


A user's guide is available on the website.

Assaf URIELI
CLLE-ERSS Laboratory, UMR 5263
Université de Toulouse II le Mirail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140219/9a9900ff/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list