[Corpora-List] Character encoding headaches

Dale Gerdemann dg at sfs.uni-tuebingen.de
Mon Aug 3 09:37:04 UTC 2009


No matter what ready-made tools you use, there will be errors and
corruptions. There is no substitute for learning about character
encodings and writing the fix-up programs yourself. Start by reading the
Wikipedia article on UTF-8.

Dale Gerdemann



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list