[Corpora-List] massive dictionary-based tagging

Carlos Rodriguez crodriguezp at gmail.com
Wed Aug 8 10:36:43 UTC 2007


Hi,
I have a quick question for the list:
When identifying biological entities such as genes or proteins, we are
frequently confronted with the need to do look-ups in dictionaries
with several millions of entries, and tagging them in corpus with
(likewise) millions of sentences. What is the most efficient way to do
this? Anyone knows of a tool or technique to do this kind of thing
quickly and accurately? I bet that people doing NER with million-plus
geographic name  gazetteers have this same problem.
Thanks,

-- 
Carlos Rodríguez
Spanish National Cancer Research Center
CRodriguezP at gmail.com

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list