[Corpora-List] Terminology Extraction

Jakob Halskov jh.id at cbs.dk
Thu Jun 15 09:03:07 UTC 2006


Hi Katya,

TermoStat developed by Patrick Drouin of the University of Montreal is a pretty good tool, I think. Here's the link:

http://olst.ling.umontreal.ca/~drouinp/termostat_web/

You simply upload your document, pick a language and a statistical metric for the extraction and get a list of term candidates for free :-)

If you happen to have access to a large general language corpus, such as the British National Corpus, you can build your own term extraction program by comparing the relative frequencies of all the words in your company documents with their relative frequencies in the BNC. This measure has been called "weirdness" and is described in a paper by Khurshid Ahmad from 1993, I think.

Good luck!

Jakob Halskov
--
PhD student
Dept. of Computational Linguistics
Copenhagen Business School
Denmark


----- Original Message -----
From: Katya Alahverdzhieva <katya.alahverdzhieva at gmail.com>
Date: Thursday, June 15, 2006 9:58 am
Subject: [Corpora-List] Terminology Extraction

> Hi,
> 
> our company is about to create a termbase (for the company specific
> terminology) on the basis of the existing documents and for that 
> purpose we
> are looking for the best tool. We tried using the SDL MultiTerm 
> Extract but
> are not satisfied with the results. Can you please advice me on 
> the issue
> and of course, how useful such tools are.
> 
> Best wishes
> 



More information about the Corpora mailing list