Corpora: Term extraction. How to implement?

Hristo Tanev htanev at yahoo.co.uk
Sat Feb 10 11:54:10 UTC 2001


Dear All,
Currently I am working with some students on term
extraction from IT texts in Bulgarian.
For Bulgarian we don't have annotated corpora, so we
intend to use collection of texts - one collection
from IT texts and the other from non-IT texts.

We intend to make the term extraction by taking the
most frequently appearing words from the IT
collection, which don't appear frequently in the
non-IT texts, thus skipping prepositions, conjunctions
and other frequently used non-term words.

Can someone tell me more about this kind of term
extraction?
And eventually can someone propose another method for
term extraxtion, which doesn't require annotated
corpora.

Best wishes,
Hristo Tanev


____________________________________________________________
Do You Yahoo!?
Get your free @yahoo.co.uk address at http://mail.yahoo.co.uk
or your free @yahoo.ie address at http://mail.yahoo.ie



More information about the Corpora mailing list