[Corpora-List] english lexicon

Brierley, Claire C.Brierley at bolton.ac.uk
Fri Apr 3 21:36:59 UTC 2009


On Thu, 2 Apr 2009, Tine Lassen wrote:
> I am looking for a - preferably - freely available lexicon of English words and their inflectional forms.

Hello Tine,
 
You might also like to look at ProPOSEL, a prosody and part-of-speech English lexicon, which comes as a textfile of 104,049 word forms (including separate entries for inflected forms) and which merges information from CELEX-2, CUV2/CUVPlus and CMU, the Carnegie-Mellon Pronouncing Dictionary. 
 
In ProPOSEL, each word form is mapped to four variant PoS-tagging schemes (C5; Penn Treebank; LOB; C7); default closed and open-class word categories; canonical phonetic transcriptions (SAM-PA and DISC); syllable counts; consonant-vowel (CV) patterns; and lexical stress patterns i.e. abstract representations of rhythmic structure. So for example, a selection of fields {word; C5 tag; lexical stress pattern; Penn Treebank tag; default content-function word tag; LOB tag; C7 tag; and DISC phonetic transcription mapped to stress weightings} for secure looks like this: 
 
secure|VVI|01|VB|C|VB|VVI|sI:0 'kj9R:1
secure|AJ0|01|JJ|C|JJ,JJB,JNP|JJ,JK|sI:0 'kj9R:1
secures|VVZ|01|VBZ|C|VBZ|VVZ|sI:0 'kj9z:1
secured|VVD|01|VBD|C|VBD|VVD|sI:0 'kj9d:1
 
There is a paper available here:  
http://www.lrec-conf.org/proceedings/lrec2008/summaries/724.html <http://www.lrec-conf.org/proceedings/lrec2008/summaries/724.html>  
 
For further information, just contact me.
 
Claire Brierley
Games Computing and Creative Technologies
University of Bolton, UK <http://www.lrec-conf.org/proceedings/lrec2008/summaries/724.html> 

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list