[Corpora-List] From Mikhail Alexandrov (Term Selection) program LexisTerm

Mikhail Alexandrov malexandrov at mail.ru
Wed Oct 19 13:00:42 UTC 2011

Dear Corpora List users, 
Perhaps my information would be useful for those who deal with document indexing 
I mean the single term selection from a document set  
The program LexisTerm allows to select terms from a document set 
on the basis of so-called “criterion of specificity”. 
- Let we have a list of words with their absolute or relative frequencies 
reflecting word occurences in a basic document collection. For example we could take 
a general lexis of a given language based on a National corpus of this language
- Document word specificity is a ratio of word frequency in one given document 
and its frequency in a general lexis (that is the basic lexis). 
- Corpus word specificity is a ratio of word frequency in a whole document set
and its frequency in a general lexis (that is the basic lexis). 
Note. Here corpus is  considered as one document 
The program LexisTerm allows to fix: 
- the level of word specificity K 
- the option of selection C (corpus specificity) or D (document specificity)
and to process large document collections. 
Note. Option D means here that the program extracts words from each document 
individually and then groups selected words together
Of course, such an approach is well-known but I did not know 
any easy-to-use program, which uses this approach. 
The program were developped by one peruvian student (Roque Lopez from 
the San Agustin National University) and you could easy downloadit from the site: 

The best

Mikhail Alexandrov

Dr. M.Alexandrov
* Department of system analysis and informatics
Russian Presidential Academy of national economy and public administration
* fLexSem research group
Autonomous University of Barcelona
E-mail: MAlexandrov at mail.ru
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list