[Corpora-List] Character encoding headaches

Francis Tyers ftyers at prompsit.com
Mon Aug 3 12:17:58 UTC 2009


Aha, no problem! Sorry to have jumped the gun a bit there :)

If you need any help with the leaning, please let me know,

Fran

El dl 03 de 08 del 2009 a les 14:22 +0200, en/na Josep M. Fontana va
escriure:
> Sorry. My information might not be totally accurate. I haven't talked to 
> Lluís directly about this but somebody who works with him (not on the 
> development of Freeling, though) told me that they (Lluís and --if any 
> -- associates) would work on making it compatible with UTF8 in future 
> versions. I realize in my message I said "they are already working on 
> making it compatible with UTF-8", so this might be inaccurate. My 
> apologies for being misleading. It was not my intention. I was simply 
> reacting to the message that said to "lean hard on the Freeling folks" 
> and what I meant to say was really that this is in their road map. I 
> should have been more precise.
> 
> Josep M.
> > El dl 03 de 08 del 2009 a les 12:51 +0200, en/na Josep M. Fontana va
> > escriure:
> >   
> >> Thanks a lot to everybody that responded. Problem solved!
> >>
> >> In the end the simplest, quickest solution for me was to use the 
> >> //TRANSLIT keyword as Lars Nygaard suggested. That might not work with 
> >> other kinds of texts but for the Spanish and Catalan texts I'm working 
> >> with, I guess  finding alternative characters  that approximate the 
> >> problematic characters in the original document was not too difficult 
> >> for iconv.
> >>
> >> In response to Ciarán, what is strange if Word saves as ISO-8859-1 as 
> >> default is that when you do 'file', this encoding is not recognized. The 
> >> result of running the 'file' command with most of the documents saved 
> >> from within word I'm using is "Non-ISO extended-ASCII text, with CRLF 
> >> line terminators".
> >>
> >> With respect to Freeling, I'm told that they are already working on 
> >> making it compatible with UTF-8.
> >>     
> >
> > Really ? The last I heard from Lluís was: 
> >
> > "  No, en unicode no funciona (basicament perque els strings de la STL
> > no suporten unicode encara).  Per processar textos en utf, el que fa es
> > convertir-los a latin, analitzar-los, i tornar-ho a convertir a utf.  "
> >
> > I would love to hear that FreeLing will be supporting UTF-8!!
> >
> > Fran
> >
> >
> >
> >   
> 


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list