[Corpora-List] Arabic encoding conversion
Abdusalam F Ahmad Nwesri
a.nwesri at student.rmit.edu.au
Fri Oct 26 06:23:59 UTC 2007
Hi,
I am trying to convert the Arabic Giga word corpus, prepared by the LDC, from the UTF8 format to windows CP1256 encoding. The collection is purely text with xml tags.
I tried "iconv" but it seems that there are errors converting some files. I am not sure what is the problem.
My final solution is to write a script to read the files and convert them word by word, but before I do, I want to know weather anyone has experienced the same problem.
If you are aware of another tool that I can use, please let me know.
Thanks
Abdusalam Nwesri
PhD Candidate,
School of Computer Science and IT,
RMIT University,
Melbourne,
Australia.
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list