[Corpora-List] Annotation without lexicons

Mark Davies mdavies at ilstu.edu
Tue Jan 28 10:22:07 UTC 2003


Corpus annotation is of course usually done with the aid of a lexicon
containing POS and lemma information.  But imagine that you need to tag and
lemmatize a 1-2 million word corpus of a language for which you do not have
a lexicon.  A variant of this might be the need to annotate a corpus from
the older stage of a language -- e.g. Middle English or Old Spanish --
which is related to a modern language for which you do have a lexicon.  How
is this best done?

I've had to address this issue in creating several different corpora and
have developed my own approach to the problem, but I'm interested in
alternate approaches that others might have taken.  I realize that this
might be a FAQ, but any pointers to relevant literature would be
helpful.  Thanks in advance.

Mark Davies


====================================================
Mark Davies, Associate Professor, Spanish Linguistics
4300 Foreign Languages, Illinois State University, Normal, IL 61790-4300
309-438-7975 (voice) / 309-438-8083 (fax)
http://mdavies.for.ilstu.edu
** Historical and dialectal Spanish and Portuguese syntax **
** Corpus design and use / Web-database scripting /  Distance education **
=====================================================



More information about the Corpora mailing list