Corpora: Collaborative effort

Patrick Ruch ruch at dim.hcuge.ch
Tue Jun 13 16:29:56 UTC 2000


Hi,

> I don't mean to put a damper on this idea, but we should expect that
> the agreement rate will be far from 100%.  Also, the tolerance of noise
> will depend on the amount of noise.  I did a comparison between the
> tagging of the Brown files in Semcor and the tagging done by DSO.
> I found that the agreement rate was 56%.  This is exactly the rate of
> agreement we would find by chance.  So the amount of post-processing
> could be quite a bit of work!

I am involved in a project where we are tagging medical text using a subset
of UMLS. Although words are usually less ambiguous in such "narrow"
domain than in unrestricted texts, we are also facing some agreement
problems. Could you ship me the some references about the topic ?

Best,
Patrick Ruch
_________________________________________
Patrick Ruch
University Hospital of Geneva
Medical Informatics Division
CH-1211 Geneva 14
tel.: (+41 22) 372 61 64
fax: (+41 22) 372 48 55



More information about the Corpora mailing list