Corpora: PhD Thesis

Leonel Ruiz Miyares leonel at lingapli.ciges.inf.cu
Thu Mar 29 13:57:01 UTC 2001


Dear colleagues,

Recently was finished the PhD thesis
on Computational Linguistics entitled
"Development of a computational model
for textual corpora processing on the
basis of automatic tagging" by Leonel
Ruiz Miyares from the Center of Applied
Linguistics, Ministry of Science, Technology
and Environment, Santiago de Cuba, Cuba.

The tutor of the thesis was Dr. Jorge Díaz Silvera,
Department of Computation Science, Faculty of
Natural Sciences and Mathematics from the
University of Oriente, Santiago de Cuba, Cuba.

The thesis shows the components, characteristics
and functional structure of the first Cuban
tagger with the high efectivity after the process
of two different corpus (newspapers (97.16%)
and students from the secondary level (98.15%)).

The tagger works on the basis of the HMM
(Hidden Markov Model) for desambiguation
of the words.

Novelty of the system: automatic codification
of the spelling mistakes of the students,
recognition of compose words, open lexicon,
semantic information in the lexicon, etc.


Center of Applied Linguistics
Ministry of Science, Technology and Environment
Santiago de Cuba, Cuba



More information about the Corpora mailing list