[Corpora-List] Dictionaries/Lexical Databases
maxwell at ldc.upenn.edu
maxwell at ldc.upenn.edu
Mon Nov 27 20:36:50 UTC 2006
Quoting Shane Axtell <shane.axtell at gmail.com>:
> I'm looking for lexical databases (a.k.a. dictionaries) that are freely
> available and contain at least the part of speech information for each
> entry.
I presume you're speaking of languages other than (or in addition to)
English. You might have a look at the links at
http://www.netvouz.com/mcswell/folder/7773878411777326817/Dictionaries
These are mostly links to _on-line_ dictionaries, which are not
necessarily dowloadable. If you automatically submit a large number of
queries to them, some of them might overload or cut you off, for all I
know.
> This database will be connected to an NLP system that will take in
> unstructure corpora as input and output the data in a structured manner. Any
> leads along these lines would be greatly appreciated
I'm not quite sure what you're trying to do--add POS tags to text? Of
course that will be ambiguous in many languages (like English), if you
just take the POS from a dictionary. And it won't work at all if the
language has much in the way of inflectional morphology (without a
morphological parser or stemmer, or at least some heuristics). Not to
mention named entities.
But you probably have a plan for dealing with these sorts of problems.
Mike Maxwell
CASL/ U MD
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
More information about the Corpora
mailing list