[Corpora-List] From Mikhail Alexandrov (Term Selection) program LexisTerm
Mikhail Alexandrov
malexandrov at mail.ru
Wed Oct 19 13:00:42 UTC 2011
Dear Corpora List users,
Perhaps my information would be useful for those who deal with document indexing
I mean the single term selection from a document set
The program LexisTerm allows to select terms from a document set
on the basis of so-called “criterion of specificity”.
- Let we have a list of words with their absolute or relative frequencies
reflecting word occurences in a basic document collection. For example we could take
a general lexis of a given language based on a National corpus of this language
- Document word specificity is a ratio of word frequency in one given document
and its frequency in a general lexis (that is the basic lexis).
- Corpus word specificity is a ratio of word frequency in a whole document set
and its frequency in a general lexis (that is the basic lexis).
Note. Here corpus is considered as one document
The program LexisTerm allows to fix:
- the level of word specificity K
- the option of selection C (corpus specificity) or D (document specificity)
and to process large document collections.
Note. Option D means here that the program extracts words from each document
individually and then groups selected words together
Of course, such an approach is well-known but I did not know
any easy-to-use program, which uses this approach.
The program were developped by one peruvian student (Roque Lopez from
the San Agustin National University) and you could easy downloadit from the site:
www.innegocios.com/LexisTerm/LexisTerm.zip
The best
Mikhail Alexandrov
============================
Dr. M.Alexandrov
* Department of system analysis and informatics
Russian Presidential Academy of national economy and public administration
* fLexSem research group
Autonomous University of Barcelona
E-mail: MAlexandrov at mail.ru
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list