[Corpora-List] Downloadable English-language resources

Martin Reynaert Reynaert at uvt.nl
Mon Jan 29 09:06:33 UTC 2007


Hi,

I am sure your search will be aided by using the right terminology.

You are looking for:

1/ a POS-tagger (POS = Part Of Speech). POS- taggers come with different 
tag-sets, offering varying levels of detail.

2/ a lemmatizer, which given a derived word form, returns its lemma.

These two programmes often form a pair.

Greetings,

Martin Reynaert
Postdoc
ILK
Tilburg University
The Netherlands


Gordana Ilic Holen wrote:
> Dear list members,
>
> We are looking for software/data that help in performing the following
> task programmatically, i.e., we want to use the described capability
> form a piece of software we are writing.
>
> The task is to look up an English word in order to determine its
> class.
>
> We would also like to be informed if the word is a derived form of
> another "main entry" or form.  In the latter case we would like to be
> told what the main form is: e.g., "children" has main form "child",
> "ran" has main form "run".  (Of course, these main form need not be
> unique, so the look up might result in several main forms.)
>
> Note: it is essential that lookup can be performed locally (offline).
> The reason is that we want to lookup a lot of words.  (The
> software/data does not need to be free, but we would prefer it to be.)
>
> Thanks in advance for any pointers.
>
>
> Gordana Ilic Holen and Bjarte M. Østvold
>
>



More information about the Corpora mailing list