<div dir="ltr"><b>Talismane : robust dependency parser for French</b><br>*******************************************************<br>Talismane as a complete open-source linguistic toolkit (sentence detector, tokeniser, pos-tagger and syntax parser), with an off-the-shelf implementation for French.<br>
<br>Talismane may be downloaded at:<br><a href="http://redac.univ-tlse2.fr/applications/talismane.html">http://redac.univ-tlse2.fr/applications/talismane.html</a><br><br>It was developed by Assaf Urieli within in the framework of his thesis in the NLP group of the CLLE-ERSS laboratory (UMR 5263), Université de Toulouse II le Mirail.<br>
<ul><li>Thesis : <a href="http://w3.erss.univ-tlse2.fr/textes/pagespersos/urieli/URIELI-thesis-2013.pdf">http://w3.erss.univ-tlse2.fr/textes/pagespersos/urieli/URIELI-thesis-2013.pdf</a></li><li>CLLE-ERSS laboratory : <a href="http://w3.erss.univ-tlse2.fr/">http://w3.erss.univ-tlse2.fr/</a></li>
</ul>It is an open source toolkit written in Java, and distributed under the Affero GPL v3 licence: <a href="http://www.gnu.org/licenses/agpl-3.0.html">http://www.gnu.org/licenses/agpl-3.0.html</a><br><br>Talismane performs, for French:<br>
<ul><li>sentence boundary detection;</li><li>tokenisation;</li><li>pos-tagging and lemmatisation;</li><li>transition-based dependency parsing.</li></ul>Talismane was trained on the French Treebank (Abeillé et al, 2003) for sentence boundary detection, tokenisation and pos-tagging, and on the French Treebank converted to dependencies (Candito et al, 2010) for parsing. It uses the LeFFF as the default lexicon (Sagot 2010). It is fully configurable (machine learning parameters, features, rules, tagset, lexicon, etc.) and can be trained for other languages with dependency treebanks available.<br>
<br>Furthermore, Talismane makes it possible to:<br><ul><li>analyse quickly (measured at 2 million words an hour in the base configuration);</li><li>parse XML and HTML simply, by adding filters indicating which parts should be analysed;</li>
<li>add rules to override the statistical model's decisions during analysis, imposing or prohibiting a local decision based on the context;</li><li>choose between a better-quality analysis (wide beam) and faster analysis (narrow beam);</li>
<li>propagate ambiguity from one level of analysis to the next (e.g. pos-tagging to parsing), to allow a higher level module correct the errors of a lower-level module;</li><li>keep a trace of the exact position of each analysed token in the original file;</li>
<li>output system confidence in each decision taken.</li></ul><br>A user's guide is available on the website.<br><br>Assaf URIELI<br>CLLE-ERSS Laboratory, UMR 5263<br>Université de Toulouse II le Mirail<br></div>