[Corpora-List] Meaning of Semcor annotations

Jose Maria Gomez Hidalgo jmgomez at dinar.esi.uem.es
Thu May 22 21:53:55 UTC 2003

Dear all

I am performing some experiments with the semantic concordance SemCor, and 
I have found some difficulties in interpreting available documentation. Any 
word in SemCor is labelled according to its meaning in WordNet:

<wf cmd=done pos=VB lemma=say wnsn=1 lexsn=2:32:00::>said</wf>

The word "said" has the part of speech VB (verb), its lemma is "say", and 
the corresponding meaning in WordNet can be got by searching for "say" and 
selecting the first sense (attribute wnsn). The attribute lexsn, according 
to the documentation, and appended to the lemma, identifies the WordNet 
synset for that meaning.

However, the lexsn attribute value is not unique for the synset. Many other 
words in SemCor have the same value:

<wf cmd=done pos=VB lemma=consider wnsn=4 lexsn=2:32:00::>considering</wf>
<wf cmd=done pos=VB lemma=revise wnsn=1 lexsn=2:32:00::>revised</wf>

(all three extrated from brown1/tagfiles/br-a01)

Those words or lemmata do not belong to the same synset. It is important to 
know when word senses belong to the same synset, because this way synonym 
words __in the SemCor collection__ can be identified. The only way to know 
this, apart of consulting WordNet itself, is having unique synset 
identifiers in SemCor. Is the information in Semcor annotations enough to 
get that unique identification? How can we do it?

Thank you


Jose Maria Gomez Hidalgo
Departamento de Inteligencia Artificial
Universidad Europea de Madrid
28670 - Villaviciosa de Odon - MADRID
(+34) 912115670
jmgomez at dinar.esi.uem.es

La legislación española ampara el secreto de las comunicaciones. Este 
correo electrónico es estrictamente confidencial y va dirigido 
exclusivamente a su destinatario/a. Si no es Ud., le rogamos que no difunda 
ni copie la transmisión y nos lo notifique cuanto antes.

Spanish law guarantees privacy in electronic communications. This 
electronic transmission is strictly confidential and intended solely for 
the addressee. If you are not the intended addressee, you are kindly 
requested not to disclose nor to copy this transmission and to notify us as 
soon as possible.

More information about the Corpora mailing list