[Corpora-List] Dictionaries/Lexical Databases

maxwell at ldc.upenn.edu maxwell at ldc.upenn.edu
Mon Nov 27 20:36:50 UTC 2006


Quoting Shane Axtell <shane.axtell at gmail.com>:
> I'm looking for lexical databases (a.k.a. dictionaries) that are freely
> available and contain at least the part of speech information for each
> entry.

I presume you're speaking of languages other than (or in addition to) 
English.  You might have a look at the links at
http://www.netvouz.com/mcswell/folder/7773878411777326817/Dictionaries
These are mostly links to _on-line_ dictionaries, which are not 
necessarily dowloadable.  If you automatically submit a large number of 
queries to them, some of them might overload or cut you off, for all I 
know.

> This database will be connected to an NLP system that will take in
> unstructure corpora as input and output the data in a structured manner. Any
> leads along these lines would be greatly appreciated

I'm not quite sure what you're trying to do--add POS tags to text?  Of 
course that will be ambiguous in many languages (like English), if you 
just take the POS from a dictionary.  And it won't work at all if the 
language has much in the way of inflectional morphology (without a 
morphological parser or stemmer, or at least some heuristics).  Not to 
mention named entities.

But you probably have a plan for dealing with these sorts of problems.

   Mike Maxwell
   CASL/ U MD

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.



More information about the Corpora mailing list