[Corpora-List] Re : Invalid UTF8 character encountered! with Treetagger french parameter file

Alberto Simões albie at alfarrabio.di.uminho.pt
Mon Dec 27 22:17:04 UTC 2010


On 27/12/2010 22:15, Samir Bilal wrote:
> Hi,
>
> I open the file with Notepad++, it detects ANSI encoding.

Then try with the other parameter file available on treetagger website 
(that does not include the 'utf8' in the name).

Or force Notepad++ to save the file in UTF8 (use save as. As precaution, 
save with other name)

Cheers

>
> Regards
>
> ------------------------------------------------------------------------
> *De :* Alberto Simões <albie at alfarrabio.di.uminho.pt>
> *À :* corpora at uib.no
> *Envoyé le :* Lun 27 décembre 2010, 22h 53min 12s
> *Objet :* Re: [Corpora-List] Invalid UTF8 character encountered! with
> Treetagger french parameter file
>
> One first suggestion would be to recheck if your input file is in UTF8
> encoding.
>
> Try opening the text file in an editor like Notepad++ and check what
> encoding it detects.
>
> cheers
>
> On 27/12/2010 21:43, Samir Bilal wrote:
>  > Hi,
>  >
>  > I am testing the cureent POS taggers for the french languague. For
>  > Treetagger I have an error in some case.
>  > For a sentence with accent(for example:" l' étiqueteur se bloque".) , I
>  > encounter this error :
>  >
>  > Invalid UTF8 character encountered!
>  > because of the accent with é.
>  > But if the sentence has no accent character, the tagger works well.
>  >
>  > I use the french parameter file at
>  >
> ftp://ftp.ims.uni-stuttgart.de/pub/corpora/french-par-linux-3.2-utf8.bin.gz
>  > .
>  > My OS is Windows XP.
>  >
>  > Can anybody help me?
>  >
>  > Regards
>  > Samir
>  >
>  >
>  >
>  > ------------------------------------------------------------------------
>  > *De :* DJamé Seddah <djame.seddah at free.fr <mailto:djame.seddah at free.fr>>
>  > *À :* Samir Bilal <samirbilal2 at yahoo.fr <mailto:samirbilal2 at yahoo.fr>>
>  > *Envoyé le :* Dim 26 décembre 2010, 0h 47min 27s
>  > *Objet :* Re: Re : [Corpora-List] Looking for free french POS tagger.
>  >
>  > Hi, in that case I'll recommand to use
>  > morfette as it provides windows binaries and pretrained models.
>  >
>  > input format (unix line separator)
>  > one word per line
>  > one blank line to separate sentences
>  > and all in utf8
>  >
>  > use this command
>  > c:|whereverver/morfette predict MODELNAME < input > output.tagged
>  >
>  >
>  > Djamé
>  >
>  >
>  >
>  > Le 25 déc. 2010 à 23:43, Samir Bilal a écrit :
>  >
>  > > Hi,
>  > >
>  > > Thank you very much. My operating system is Window XP. I did not
>  > succed to run
>  > > MeLT on it yet.Plesae can you help me?
>  > > It will be wonderful, if I can use it on python program also.
>  > >
>  > >
>  > > Many thanks
>  > > Samir
>  > >
>  > >
>  > >
>  > >
>  > > ________________________________
>  > > De : DJamé Seddah <djame.seddah at free.fr
> <mailto:djame.seddah at free.fr> <mailto:djame.seddah at free.fr
> <mailto:djame.seddah at free.fr>>>
>  > > À : corpora at uib.no <mailto:corpora at uib.no> <mailto:corpora at uib.no
> <mailto:corpora at uib.no>>
>  > > Envoyé le : Sam 25 décembre 2010, 22h 54min 42s
>  > > Objet : Re: [Corpora-List] Looking for free french POS tagger.
>  > >
>  > > Hi,
>  > > There're also two state-of-the-art data driven pos tagger available
>  > >
>  > > MeLT
>  > > https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz
>  > > and
>  > > Morfette (which also provides a data driven lemmatizer)
>  > > http://sites.google.com/site/morfetteweb/
>  > >
>  > > both provide training models from the French Treebank (tagset CC,
>  > around 97.6 -
>  > > 98% of accuracy, the one to use for stat parsing ) and for a richer
>  > tagset
>  > > (tagset max, around 92-94%)
>  > >
>  > >
>  > > Best,
>  > >
>  > > Djamé
>  > >
>  > >
>  > >
>  > > Le 25 déc. 2010 à 19:35, Samir Bilal a écrit :
>  > >
>  > >> Hi everybody,
>  > >>
>  > >> I am looking for a free french POS tagger.
>  > >>
>  > >> Thank you
>  > >> Samir
>  > >>
>  > >>
>  > >> _______________________________________________
>  > >> Corpora mailing list
>  > >> Corpora at uib.no <mailto:Corpora at uib.no> <mailto:Corpora at uib.no
> <mailto:Corpora at uib.no>>
>  > >> http://mailman.uib.no/listinfo/corpora
>  > >
>  > >
>  > > _______________________________________________
>  > > Corpora mailing list
>  > > Corpora at uib.no <mailto:Corpora at uib.no> <mailto:Corpora at uib.no
> <mailto:Corpora at uib.no>>
>  > > http://mailman.uib.no/listinfo/corpora
>  > >
>  > >
>  > >
>  >
>  >
>  >
>  >
>  > _______________________________________________
>  > Corpora mailing list
>  > Corpora at uib.no <mailto:Corpora at uib.no>
>  > http://mailman.uib.no/listinfo/corpora
>
> --
> Alberto Simões
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>

-- 
Alberto Simões

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list