[Corpora-List] Invalid UTF8 character encountered! with Treetagger french parameter file
Alberto Simões
albie at alfarrabio.di.uminho.pt
Mon Dec 27 21:53:12 UTC 2010
One first suggestion would be to recheck if your input file is in UTF8
encoding.
Try opening the text file in an editor like Notepad++ and check what
encoding it detects.
cheers
On 27/12/2010 21:43, Samir Bilal wrote:
> Hi,
>
> I am testing the cureent POS taggers for the french languague. For
> Treetagger I have an error in some case.
> For a sentence with accent(for example:" l' étiqueteur se bloque".) , I
> encounter this error :
>
> Invalid UTF8 character encountered!
> because of the accent with é.
> But if the sentence has no accent character, the tagger works well.
>
> I use the french parameter file at
> ftp://ftp.ims.uni-stuttgart.de/pub/corpora/french-par-linux-3.2-utf8.bin.gz
> .
> My OS is Windows XP.
>
> Can anybody help me?
>
> Regards
> Samir
>
>
>
> ------------------------------------------------------------------------
> *De :* DJamé Seddah <djame.seddah at free.fr>
> *À :* Samir Bilal <samirbilal2 at yahoo.fr>
> *Envoyé le :* Dim 26 décembre 2010, 0h 47min 27s
> *Objet :* Re: Re : [Corpora-List] Looking for free french POS tagger.
>
> Hi, in that case I'll recommand to use
> morfette as it provides windows binaries and pretrained models.
>
> input format (unix line separator)
> one word per line
> one blank line to separate sentences
> and all in utf8
>
> use this command
> c:|whereverver/morfette predict MODELNAME < input > output.tagged
>
>
> Djamé
>
>
>
> Le 25 déc. 2010 à 23:43, Samir Bilal a écrit :
>
> > Hi,
> >
> > Thank you very much. My operating system is Window XP. I did not
> succed to run
> > MeLT on it yet.Plesae can you help me?
> > It will be wonderful, if I can use it on python program also.
> >
> >
> > Many thanks
> > Samir
> >
> >
> >
> >
> > ________________________________
> > De : DJamé Seddah <djame.seddah at free.fr <mailto:djame.seddah at free.fr>>
> > À : corpora at uib.no <mailto:corpora at uib.no>
> > Envoyé le : Sam 25 décembre 2010, 22h 54min 42s
> > Objet : Re: [Corpora-List] Looking for free french POS tagger.
> >
> > Hi,
> > There're also two state-of-the-art data driven pos tagger available
> >
> > MeLT
> > https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz
> > and
> > Morfette (which also provides a data driven lemmatizer)
> > http://sites.google.com/site/morfetteweb/
> >
> > both provide training models from the French Treebank (tagset CC,
> around 97.6 -
> > 98% of accuracy, the one to use for stat parsing ) and for a richer
> tagset
> > (tagset max, around 92-94%)
> >
> >
> > Best,
> >
> > Djamé
> >
> >
> >
> > Le 25 déc. 2010 à 19:35, Samir Bilal a écrit :
> >
> >> Hi everybody,
> >>
> >> I am looking for a free french POS tagger.
> >>
> >> Thank you
> >> Samir
> >>
> >>
> >> _______________________________________________
> >> Corpora mailing list
> >> Corpora at uib.no <mailto:Corpora at uib.no>
> >> http://mailman.uib.no/listinfo/corpora
> >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no <mailto:Corpora at uib.no>
> > http://mailman.uib.no/listinfo/corpora
> >
> >
> >
>
>
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
--
Alberto Simões
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list