[Corpora-List] Character encoding headaches
Francis Tyers
ftyers at prompsit.com
Mon Aug 3 12:17:58 UTC 2009
Aha, no problem! Sorry to have jumped the gun a bit there :)
If you need any help with the leaning, please let me know,
Fran
El dl 03 de 08 del 2009 a les 14:22 +0200, en/na Josep M. Fontana va
escriure:
> Sorry. My information might not be totally accurate. I haven't talked to
> Lluís directly about this but somebody who works with him (not on the
> development of Freeling, though) told me that they (Lluís and --if any
> -- associates) would work on making it compatible with UTF8 in future
> versions. I realize in my message I said "they are already working on
> making it compatible with UTF-8", so this might be inaccurate. My
> apologies for being misleading. It was not my intention. I was simply
> reacting to the message that said to "lean hard on the Freeling folks"
> and what I meant to say was really that this is in their road map. I
> should have been more precise.
>
> Josep M.
> > El dl 03 de 08 del 2009 a les 12:51 +0200, en/na Josep M. Fontana va
> > escriure:
> >
> >> Thanks a lot to everybody that responded. Problem solved!
> >>
> >> In the end the simplest, quickest solution for me was to use the
> >> //TRANSLIT keyword as Lars Nygaard suggested. That might not work with
> >> other kinds of texts but for the Spanish and Catalan texts I'm working
> >> with, I guess finding alternative characters that approximate the
> >> problematic characters in the original document was not too difficult
> >> for iconv.
> >>
> >> In response to Ciarán, what is strange if Word saves as ISO-8859-1 as
> >> default is that when you do 'file', this encoding is not recognized. The
> >> result of running the 'file' command with most of the documents saved
> >> from within word I'm using is "Non-ISO extended-ASCII text, with CRLF
> >> line terminators".
> >>
> >> With respect to Freeling, I'm told that they are already working on
> >> making it compatible with UTF-8.
> >>
> >
> > Really ? The last I heard from Lluís was:
> >
> > " No, en unicode no funciona (basicament perque els strings de la STL
> > no suporten unicode encara). Per processar textos en utf, el que fa es
> > convertir-los a latin, analitzar-los, i tornar-ho a convertir a utf. "
> >
> > I would love to hear that FreeLing will be supporting UTF-8!!
> >
> > Fran
> >
> >
> >
> >
>
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list