[Corpora-List] Re : Invalid UTF8 character encountered! with Treetagger french parameter file

Samir Bilal samirbilal2 at yahoo.fr
Mon Dec 27 22:15:26 UTC 2010


Hi,

I open the file with Notepad++, it  detects ANSI encoding.

Regards




________________________________
De : Alberto Simões <albie at alfarrabio.di.uminho.pt>
À : corpora at uib.no
Envoyé le : Lun 27 décembre 2010, 22h 53min 12s
Objet : Re: [Corpora-List] Invalid UTF8 character encountered! with Treetagger 
french parameter file

One first suggestion would be to recheck if your input file is in UTF8 
encoding.

Try opening the text file in an editor like Notepad++ and check what 
encoding it detects.

cheers

On 27/12/2010 21:43, Samir Bilal wrote:
> Hi,
>
> I am testing the cureent POS taggers for the french languague. For
> Treetagger I have an error in some case.
> For a sentence with accent(for example:" l' étiqueteur se bloque".) , I
> encounter this error :
>
> Invalid UTF8 character encountered!
> because of the accent with é.
> But if the sentence has no accent character, the tagger works well.
>
> I use the french parameter file at
> ftp://ftp.ims.uni-stuttgart.de/pub/corpora/french-par-linux-3.2-utf8.bin.gz
> .
> My OS is Windows XP.
>
> Can anybody help me?
>
> Regards
> Samir
>
>
>
> ------------------------------------------------------------------------
> *De :* DJamé Seddah <djame.seddah at free.fr>
> *À :* Samir Bilal <samirbilal2 at yahoo.fr>
> *Envoyé le :* Dim 26 décembre 2010, 0h 47min 27s
> *Objet :* Re: Re : [Corpora-List] Looking for free french POS tagger.
>
> Hi, in that case I'll recommand to use
> morfette as it provides windows binaries and pretrained models.
>
> input format (unix line separator)
> one word per line
> one blank line to separate sentences
> and all in utf8
>
> use this command
> c:|whereverver/morfette predict MODELNAME < input > output.tagged
>
>
> Djamé
>
>
>
> Le 25 déc. 2010 à 23:43, Samir Bilal a écrit :
>
>  > Hi,
>  >
>  > Thank you very much. My operating system is Window XP. I did not
> succed to run
>  > MeLT on it yet.Plesae can you help me?
>  > It will be wonderful, if I can use it on python program also.
>  >
>  >
>  > Many thanks
>  > Samir
>  >
>  >
>  >
>  >
>  > ________________________________
>  > De : DJamé Seddah <djame.seddah at free.fr <mailto:djame.seddah at free.fr>>
>  > À : corpora at uib.no <mailto:corpora at uib.no>
>  > Envoyé le : Sam 25 décembre 2010, 22h 54min 42s
>  > Objet : Re: [Corpora-List] Looking for free french POS tagger.
>  >
>  > Hi,
>  > There're also two state-of-the-art data driven pos tagger available
>  >
>  > MeLT
>  > https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz
>  > and
>  > Morfette (which also provides a data driven lemmatizer)
>  > http://sites.google.com/site/morfetteweb/
>  >
>  > both provide training models from the French Treebank (tagset CC,
> around 97.6 -
>  > 98% of accuracy, the one to use for stat parsing ) and for a richer
> tagset
>  > (tagset max, around 92-94%)
>  >
>  >
>  > Best,
>  >
>  > Djamé
>  >
>  >
>  >
>  > Le 25 déc. 2010 à 19:35, Samir Bilal a écrit :
>  >
>  >> Hi everybody,
>  >>
>  >> I am looking for a free french POS tagger.
>  >>
>  >> Thank you
>  >> Samir
>  >>
>  >>
>  >> _______________________________________________
>  >> Corpora mailing list
>  >> Corpora at uib.no <mailto:Corpora at uib.no>
>  >> http://mailman.uib.no/listinfo/corpora
>  >
>  >
>  > _______________________________________________
>  > Corpora mailing list
>  > Corpora at uib.no <mailto:Corpora at uib.no>
>  > http://mailman.uib.no/listinfo/corpora
>  >
>  >
>  >
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

-- 
Alberto Simões

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20101227/4a5c7635/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list