[Corpora-List] massive dictionary-based tagging
Carlos Rodriguez
crodriguezp at gmail.com
Wed Aug 8 10:36:43 UTC 2007
Hi,
I have a quick question for the list:
When identifying biological entities such as genes or proteins, we are
frequently confronted with the need to do look-ups in dictionaries
with several millions of entries, and tagging them in corpus with
(likewise) millions of sentences. What is the most efficient way to do
this? Anyone knows of a tool or technique to do this kind of thing
quickly and accurately? I bet that people doing NER with million-plus
geographic name gazetteers have this same problem.
Thanks,
--
Carlos Rodríguez
Spanish National Cancer Research Center
CRodriguezP at gmail.com
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list