[Lexicog] help with N-grams
Marc FRYD
marc.fryd at UNIV-POITIERS.FR
Sun Oct 26 17:55:06 UTC 2008
Hi J.L.,
Thank you for your kind interest.
Please find attached a .txt sample. Words are presented in sequences of
three lines, in each case:
- Line 1 provides the full graphemic string (English surnames in this
corpus), followed by its phonemic transcription, in which phonemes are
separated by a space.
- Lines 2 and 3 provide the graphemic and phonemic alignments. Note that
conjoined letters are indicated with the "+" sign in graphemic clusters:
---- aagaard = "eI g A: d ----
a+a g a+a+r d
"eI g A: d
In the example above (the English surname Aagard), the clusters are:
<a+a> and <a+a+r>).
Note that primary stress is marked with a double quote < " >, and
secondary stress with the
percent sign < % >, placed directly before the stressed vowels.
Do note that the phonemic symbols are case-sensitive.
Just ask if there is anything else you need to know.
With kind regards,
Marc
J.L. DeLucca wrote:
>
> Hi Mark,
>
> I have a software tool for doing ngrams (bi,tri,tetra y penta), but I
> know I you are looking for something more precise. Could you send me a
> short piece of your database or your text?
>
> Best for now,
>
> J. L. De Lucca
> Universidad Politécnica de Valencia
> Departamento de Linguistica Aplicada
>
> --- On *Sat, 10/25/08, Marc FRYD /<marc.fryd at univ-poitiers.fr>/* wrote:
>
> From: Marc FRYD <marc.fryd at univ-poitiers.fr>
> Subject: [Lexicog] help with N-grams
> To: lexicographylist at yahoogroups.com
> Date: Saturday, October 25, 2008, 12:49 AM
>
> Hi all,
> I wonder if anyone could help a linguist with moderate programming
> abilities with the following task.
> I am currently working on a corpus of aligned grapheme-to- phoneme
> isolated words.
> I would like to produce an N-gram parsing of both levels of data (the
> graphemic and the phonemic) with a view to extracting trends
> favouring
> realisations (i.e. this grapheme will realise as that phoneme with
> an x
> rate of occurrence if preceded/followed by such and such
> graphemes). The
> db is currently c3000 words, but it will keep growing.
> Cheers,
> Marc
>
> --
> Dr. Marc FRYD
> Senior Lecturer in English Linguistics
>
> Faculté des Lettres et des Langues
> Université de Poitiers
> 95 avenue du Recteur Pineau
> 86022, Poitiers, France
>
> Office: 05 49 45 48 11
> Cell: 06 76 28 18 50
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20081026/1882cfab/attachment.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: surnames.alignment_sample.txt
URL: <http://listserv.linguistlist.org/pipermail/lexicography/attachments/20081026/1882cfab/attachment.txt>
More information about the Lexicography
mailing list