[Corpora-List] Re : Invalid UTF8 character encountered! with Treetagger french parameter file
Alberto Simões
albie at alfarrabio.di.uminho.pt
Mon Dec 27 22:17:04 UTC 2010
On 27/12/2010 22:15, Samir Bilal wrote:
> Hi,
>
> I open the file with Notepad++, it detects ANSI encoding.
Then try with the other parameter file available on treetagger website
(that does not include the 'utf8' in the name).
Or force Notepad++ to save the file in UTF8 (use save as. As precaution,
save with other name)
Cheers
>
> Regards
>
> ------------------------------------------------------------------------
> *De :* Alberto Simões <albie at alfarrabio.di.uminho.pt>
> *À :* corpora at uib.no
> *Envoyé le :* Lun 27 décembre 2010, 22h 53min 12s
> *Objet :* Re: [Corpora-List] Invalid UTF8 character encountered! with
> Treetagger french parameter file
>
> One first suggestion would be to recheck if your input file is in UTF8
> encoding.
>
> Try opening the text file in an editor like Notepad++ and check what
> encoding it detects.
>
> cheers
>
> On 27/12/2010 21:43, Samir Bilal wrote:
> > Hi,
> >
> > I am testing the cureent POS taggers for the french languague. For
> > Treetagger I have an error in some case.
> > For a sentence with accent(for example:" l' étiqueteur se bloque".) , I
> > encounter this error :
> >
> > Invalid UTF8 character encountered!
> > because of the accent with é.
> > But if the sentence has no accent character, the tagger works well.
> >
> > I use the french parameter file at
> >
> ftp://ftp.ims.uni-stuttgart.de/pub/corpora/french-par-linux-3.2-utf8.bin.gz
> > .
> > My OS is Windows XP.
> >
> > Can anybody help me?
> >
> > Regards
> > Samir
> >
> >
> >
> > ------------------------------------------------------------------------
> > *De :* DJamé Seddah <djame.seddah at free.fr <mailto:djame.seddah at free.fr>>
> > *À :* Samir Bilal <samirbilal2 at yahoo.fr <mailto:samirbilal2 at yahoo.fr>>
> > *Envoyé le :* Dim 26 décembre 2010, 0h 47min 27s
> > *Objet :* Re: Re : [Corpora-List] Looking for free french POS tagger.
> >
> > Hi, in that case I'll recommand to use
> > morfette as it provides windows binaries and pretrained models.
> >
> > input format (unix line separator)
> > one word per line
> > one blank line to separate sentences
> > and all in utf8
> >
> > use this command
> > c:|whereverver/morfette predict MODELNAME < input > output.tagged
> >
> >
> > Djamé
> >
> >
> >
> > Le 25 déc. 2010 à 23:43, Samir Bilal a écrit :
> >
> > > Hi,
> > >
> > > Thank you very much. My operating system is Window XP. I did not
> > succed to run
> > > MeLT on it yet.Plesae can you help me?
> > > It will be wonderful, if I can use it on python program also.
> > >
> > >
> > > Many thanks
> > > Samir
> > >
> > >
> > >
> > >
> > > ________________________________
> > > De : DJamé Seddah <djame.seddah at free.fr
> <mailto:djame.seddah at free.fr> <mailto:djame.seddah at free.fr
> <mailto:djame.seddah at free.fr>>>
> > > À : corpora at uib.no <mailto:corpora at uib.no> <mailto:corpora at uib.no
> <mailto:corpora at uib.no>>
> > > Envoyé le : Sam 25 décembre 2010, 22h 54min 42s
> > > Objet : Re: [Corpora-List] Looking for free french POS tagger.
> > >
> > > Hi,
> > > There're also two state-of-the-art data driven pos tagger available
> > >
> > > MeLT
> > > https://gforge.inria.fr/frs/download.php/27240/melt-0.6.tar.gz
> > > and
> > > Morfette (which also provides a data driven lemmatizer)
> > > http://sites.google.com/site/morfetteweb/
> > >
> > > both provide training models from the French Treebank (tagset CC,
> > around 97.6 -
> > > 98% of accuracy, the one to use for stat parsing ) and for a richer
> > tagset
> > > (tagset max, around 92-94%)
> > >
> > >
> > > Best,
> > >
> > > Djamé
> > >
> > >
> > >
> > > Le 25 déc. 2010 à 19:35, Samir Bilal a écrit :
> > >
> > >> Hi everybody,
> > >>
> > >> I am looking for a free french POS tagger.
> > >>
> > >> Thank you
> > >> Samir
> > >>
> > >>
> > >> _______________________________________________
> > >> Corpora mailing list
> > >> Corpora at uib.no <mailto:Corpora at uib.no> <mailto:Corpora at uib.no
> <mailto:Corpora at uib.no>>
> > >> http://mailman.uib.no/listinfo/corpora
> > >
> > >
> > > _______________________________________________
> > > Corpora mailing list
> > > Corpora at uib.no <mailto:Corpora at uib.no> <mailto:Corpora at uib.no
> <mailto:Corpora at uib.no>>
> > > http://mailman.uib.no/listinfo/corpora
> > >
> > >
> > >
> >
> >
> >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no <mailto:Corpora at uib.no>
> > http://mailman.uib.no/listinfo/corpora
>
> --
> Alberto Simões
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no <mailto:Corpora at uib.no>
> http://mailman.uib.no/listinfo/corpora
>
--
Alberto Simões
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list