[Corpora-List] in need of a specialized lexicon (summary)
Joel Tetreault
tetreaul at cs.rochester.edu
Mon Sep 27 16:12:04 UTC 2004
Hi, I'd like to thank everyone who emailed me about my request for a
comprehensive lexicon containing semantic (or quasi-semantic) noun
features such as mass/count, abstract/concrete,
object/measure/event/state/process/etc, part/whole,
etc., on top of verb frames with argument-type preferences.
Here's a summary of the information provided by listmembers:
1. Oxford Advanced Learner's Dictionary of current English (text number
0710 in the Oxford Text Archive, or at
http://www.gtoal.com/wordgames/ota/710/ ) was prepared by Roger Mitton,
and includes noun features including countable/uncountable/proper and an
interesting but very non-standard verb frame structure. Note that the
data was produced in 1986 and updated in 1992.
(thanks to Jonathan Young <jonathan_young at comcast.net>)
2. The Specialist Lexicon of the Unified Medical Language System (lexical
needs for the medical community). This
lexicon contains over 220,000 terms and was developed to provide the
lexical information needed for the SPECIALIST Natural Language
Processing System. It is intended to be a general English lexicon that
includes many biomedical terms. Coverage includes commonly occurring
English words and biomedical vocabulary. The data elements in the
lexicon describe syntactic characteristics of each entry, including
inflection codes, case, gender, syntactic category, complements for
verbs and nouns, modification types for adverbs, and more. This is
lexicon was developed as a free, publicly available resource, with only
moderate restrictions (e.g., you can't claim it as your own)."
3. http://www.clres.com/lexdata.html - links to lexicon data
(previous two thanks to Ken Litkowski ken at clres.com)
4. Longman Dictonaries:
* Longman Dictionary of Contemporary English, Lisp version (LDOCE Lisp -
1978):
http://www.longman.com/dictionaries/research/reslisp.html
* Longman Dictionary of Contemporary English, NLP version (LDOCE NLP -
2000):
http://www.longman.com/dictionaries/research/resnlapp.html#4
(thanks to "Crowdy, Steve" <Steve.Crowdy at pearson.com>)
5. Unitex: http://www-igm.univ-mlv.fr/~unitex/ has features such
animate, conrete, abstract, unit of measure, collective, etc. For
Engliush and French
(thanks to Sebastian Nagel <wastl at cis.uni-muenchen.de>)
6. Comprehensive lexicon for Italian (7000 entries) and a smaller one for
English (3300 entries) - see Rodolfo Delmonte (1995), "Lexical
Representations: Syntax-Semantics interface and World Knowledge," in
Rivista dell'AI*IA (Associazione Italiana di Intelligenza Artificiale),
Roma, pp.11-16. for a summary of his group's work.
Thanks to all who emailed me, it was a great help.
Joel
More information about the Corpora
mailing list