<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:times new roman,new york,times,serif;font-size:12pt"><div>Hi,<br><br>I am testing the cureent POS taggers for the french languague. For Treetagger I have an error in some case.<br>For a sentence with accent(for example:" l' <span style="color: rgb(255, 0, 0); font-weight: bold;">é</span>tiqueteur se bloque".) , I encounter this error :<br><br><span style="color: rgb(0, 0, 0); font-weight: bold;">Invalid UTF8 character encountered! </span><br>because of the accent with <span style="color: rgb(255, 0, 0); font-weight: bold;">é.</span><br>But if the sentence has no accent character, the tagger works well.<br><br>I use the french parameter file at ftp://ftp.ims.uni-stuttgart.de/pub/corpora/french-par-linux-3.2-utf8.bin.gz .<br>My OS is Windows XP.<br><br>Can anybody help me?<br><br>Regards<br>Samir<br><br><br></div><div style="font-family: times
new roman,new york,times,serif; font-size: 12pt;"><br><div style="font-family: arial,helvetica,sans-serif; font-size: 13px;"><font face="Tahoma" size="2"><hr size="1"><b><span style="font-weight: bold;">De :</span></b> DJamé Seddah <djame.seddah@free.fr><br><b><span style="font-weight: bold;">À :</span></b> Samir Bilal <samirbilal2@yahoo.fr><br><b><span style="font-weight: bold;">Envoyé le :</span></b> Dim 26 décembre 2010, 0h 47min 27s<br><b><span style="font-weight: bold;">Objet :</span></b> Re: Re : [Corpora-List] Looking for free french POS tagger.<br></font><br>Hi, in that case I'll recommand to use <br>morfette as it provides windows binaries and pretrained models.<br><br>input format (unix line separator)<br>one word per line<br>one blank line to separate sentences<br>and all in utf8<br><br>use this command<br>c:|whereverver/morfette predict MODELNAME < input >
output.tagged<br><br><br>Djamé<br><br><br><br>Le 25 déc. 2010 à 23:43, Samir Bilal a écrit :<br><br>> Hi,<br>> <br>> Thank you very much. My operating system is Window XP. I did not succed to run <br>> MeLT on it yet.Plesae can you help me?<br>> It will be wonderful, if I can use it on python program also.<br>> <br>> <br>> Many thanks<br>> Samir<br>> <br>> <br>> <br>> <br>> ________________________________<br>> De : DJamé Seddah <<a ymailto="mailto:djame.seddah@free.fr" href="mailto:djame.seddah@free.fr">djame.seddah@free.fr</a>><br>> À : <a ymailto="mailto:corpora@uib.no" href="mailto:corpora@uib.no">corpora@uib.no</a><br>> Envoyé le : Sam 25 décembre 2010, 22h 54min 42s<br>> Objet : Re: [Corpora-List] Looking for free french POS tagger.<br>> <br>> Hi,<br>> There're also two state-of-the-art data driven pos tagger available<br>> <br>> MeLT<br>> <a
href="https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz" target="_blank">https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz</a><br>> and<br>> Morfette (which also provides a data driven lemmatizer)<br>> <a href="http://sites.google.com/site/morfetteweb/" target="_blank">http://sites.google.com/site/morfetteweb/</a><br>> <br>> both provide training models from the French Treebank (tagset CC, around 97.6 - <br>> 98% of accuracy, the one to use for stat parsing ) and for a richer tagset <br>> (tagset max, around 92-94%)<br>> <br>> <br>> Best,<br>> <br>> Djamé<br>> <br>> <br>> <br>> Le 25 déc. 2010 à 19:35, Samir Bilal a écrit :<br>> <br>>> Hi everybody,<br>>> <br>>> I am looking for a free french POS tagger. <br>>> <br>>> Thank you<br>>> Samir<br>>> <br>>> <br>>>
_______________________________________________<br>>> Corpora mailing list<br>>> <a ymailto="mailto:Corpora@uib.no" href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>>> <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>> <br>> <br>> _______________________________________________<br>> Corpora mailing list<br>> <a ymailto="mailto:Corpora@uib.no" href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>> <a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>> <br>> <br>> <br><br></div></div>
</div><br>
</body></html>