[Corpora-List] Wordgram generator
Paul Johnston
paul.a.johnston at manchester.ac.uk
Tue Mar 11 11:04:07 UTC 2008
Can anyone recommend a wordgram generator similar to text2wngram in the
CMU-Toolkit which can handle Unicode encoded texts, preferably utf-8 or
UCS-2.
I've been using the CMU-Toolkit successfully on English text files
especially from the BNC but seem to have problems when using a UTF-8
file.
Error reading temp file count /usr/tmp/text2wngram.tmp.hb-0021205.4217.1
It seems to have problems reading the tmp files (see above) permissions
are fine and it works with ascii texts.
I've tried this on a couple of Linux systems (Fedora and SUSE) with
clean builds and in both cases text2wfreq works fine but text2wngram
does not.
Any suggestions?
Cheers Paul
Paul Johnston
Humanities Development
Room 2.12
Bridgeford Building
Manchester University
0161 275 1396
Programmers are in a race with the Universe to create bigger and better
idiot-proof programs,
while the Universe is trying to create bigger and better idiots.
So far the Universe is winning.
Rich Cook
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080311/d8260812/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list