[Corpora-List] Dictionaries/Lexical Databases

Peter Adolphs peter.adolphs at student.hu-berlin.de
Mon Nov 27 21:57:47 UTC 2006


Shane Axtell wrote:
> I'm looking for lexical databases (a.k.a. dictionaries) that are freely
> available and contain at least the part of speech information for each
> entry. This database will be connected to an NLP system that will take
> in unstructure corpora as input and output the data in a structured
> manner. Any leads along these lines would be greatly appreciated

XTAG provides a package for English inflectional morphology:
http://www.cis.upenn.edu/~xtag/

If it hasn't been replaced since then, it should be the one described by
Karp et al (1992) which is based on the 1979 edition of Collins Dictionary.

Daniel Karp, Yves Schabes, Martin Zaidel & Dania Egedi: "A Freely
Available Wide Coverage Morphological Analyzer for English". In:
Proceedings of the 14th International Conference on Computational
Linguistics (COLING '92). Nantes (France), August 1992.

If you need disambiguated pos tags, you could use TreeTagger. There are
English parameter files available on the web-site, based on the XTAG
lexicon, and trained on the Penn treebank:
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/

-- 
Peter Adolphs    peter.adolphs at student.hu-berlin.de    gpg/pgp welcome!



More information about the Corpora mailing list