[Corpora-List] What proportion of letter ngrams occur in English?
Simon King
Simon.King at ed.ac.uk
Mon Jan 26 09:30:34 UTC 2004
Bruce L. Lambert, Ph.D. wrote:
> I am revisiting an issue I brought up to this list several years ago,
> that is, how many legal/pronounceable strings can be generated from a
> fixed alphabet for a string of a given length.
One approach to this might be to consider legal syllables; there are
strong phonotactic constraints on valid onsets and codas, both on
allowed sequences and on total number of segments, which mean there are
only a few thousand allowable syllables in English out of hundreds of
thousands of possible phoneme sequences.
Of course, this is not in terms of character strings. But, for made-up
words like drug names I would guess the letter-to-sound corespondence
would be much more regular than for real words, so it would still work.
Simon
--
Dr. Simon King Simon.King at ed.ac.uk
Centre for Speech Technology Research www.cstr.ed.ac.uk
For MSc/PhD info, visit www.hcrc.ed.ac.uk/language-at-edinburgh
More information about the Corpora
mailing list