[Corpora-List] Wordgram generator

Paul Johnston paul.a.johnston at manchester.ac.uk
Tue Mar 11 11:04:07 UTC 2008


Can anyone recommend a wordgram generator similar to text2wngram in the
CMU-Toolkit which can handle Unicode encoded texts, preferably utf-8 or
UCS-2.

I've been using the CMU-Toolkit successfully on English text files
especially from the BNC but seem to have problems when using a UTF-8
file.

 

Error reading temp file count /usr/tmp/text2wngram.tmp.hb-0021205.4217.1

 

It seems to have problems reading the tmp files (see above) permissions
are fine and it works with ascii texts.

 

I've tried this on a couple of Linux systems (Fedora and SUSE) with
clean builds and in both cases text2wfreq works fine but text2wngram
does not.

Any suggestions?

 

Cheers Paul

 

 

Paul Johnston

Humanities Development

Room 2.12

Bridgeford Building

Manchester University

0161 275 1396

 

Programmers are in a race with the Universe to create bigger and better
idiot-proof programs, 

while the Universe is trying to create bigger and better idiots. 

So far the Universe is winning. 

Rich Cook

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080311/d8260812/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list