[Corpora-List] help with n-grams

William Fletcher fletcher at usna.edu
Mon Oct 27 17:03:33 UTC 2008


Hello Marc,

My free Windows program kfNgram 
http://www.kwicfinder.com/kfNgram/
can generate character n-grams ("chargrams") in addition to word n-grams
("wordgrams").  You can specify length (n), position (initial, final,
medial) etc.  It was used to produce chargrams from the BNC at
http://pie.usna.edu/explorec.html or
http://phrasesinenglish.org/explorec.html

Let me know if you have any questions about how to use it on your data, e.g.
how to define a custom character set for the phonemic data.

Regards,
Bill Fletcher 

-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Marc FRYD
Sent: Sunday, October 26, 2008 4:19 AM
To: corpora at uib.no
Subject: [Corpora-List] help with n-grams

Hi all,
I wonder if anyone could help a linguist with moderate programming abilities
with the following task.
I am currently working on a corpus of aligned grapheme-to-phoneme isolated
words.
I would like to produce an N-gram parsing of both levels of data (the
graphemic and the phonemic) with a view to extracting trends favouring
realisations (i.e. this grapheme will realise as that phoneme with an x rate
of occurrence if preceded/followed by such and such graphemes). The db is
currently c3000 words, but it will keep growing.
Cheers,
Marc



--
Dr. Marc FRYD
Senior Lecturer in English Linguistics

Faculté des Lettres et des Langues
Université de Poitiers
95 avenue du Recteur Pineau
86022, Poitiers, France

Office: 05 49 45 48 11
Cell: 06 76 28 18 50




_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list