[Corpora-List] Downloadable English-language resources
Martin Reynaert
Reynaert at uvt.nl
Mon Jan 29 09:06:33 UTC 2007
Hi,
I am sure your search will be aided by using the right terminology.
You are looking for:
1/ a POS-tagger (POS = Part Of Speech). POS- taggers come with different
tag-sets, offering varying levels of detail.
2/ a lemmatizer, which given a derived word form, returns its lemma.
These two programmes often form a pair.
Greetings,
Martin Reynaert
Postdoc
ILK
Tilburg University
The Netherlands
Gordana Ilic Holen wrote:
> Dear list members,
>
> We are looking for software/data that help in performing the following
> task programmatically, i.e., we want to use the described capability
> form a piece of software we are writing.
>
> The task is to look up an English word in order to determine its
> class.
>
> We would also like to be informed if the word is a derived form of
> another "main entry" or form. In the latter case we would like to be
> told what the main form is: e.g., "children" has main form "child",
> "ran" has main form "run". (Of course, these main form need not be
> unique, so the look up might result in several main forms.)
>
> Note: it is essential that lookup can be performed locally (offline).
> The reason is that we want to lookup a lot of words. (The
> software/data does not need to be free, but we would prefer it to be.)
>
> Thanks in advance for any pointers.
>
>
> Gordana Ilic Holen and Bjarte M. Østvold
>
>
More information about the Corpora
mailing list