[Corpora-List] Meaning of Semcor annotations

Rada Mihalcea rada at cs.unt.edu
Fri May 23 09:13:24 UTC 2003


Hi Jose,

> The word "said" has the part of speech VB (verb), its lemma is "say", and
> the corresponding meaning in WordNet can be got by searching for "say" and
> selecting the first sense (attribute wnsn). The attribute lexsn, according
> to the documentation, and appended to the lemma, identifies the WordNet
> synset for that meaning.
The attribute lexsn, appended to the lemma, will uniquely identify the
meaning of the word (what you obtain in this way is not, however, the
synset; instead, it points to one unique synset).

> However, the lexsn attribute value is not unique for the synset. Many other
> words in SemCor have the same value:
The lexsn by itself does not have give you much useful information. As
indicated by WordNet manuals, the fields in the lexsn indicate: part of
speech, number of lexicographer file, and number within that file
(adjectives would have some additional information). So there may be
hundreds, or even thousands of different words having an identical lexsn
(most of the words in a lexicographer file will have identical lexsn).
You may want to check out the WordNet manuals for additional information
http://www.cogsci.princeton.edu/~wn/man1.7.1/senseidx.5WN.html

> Those words or lemmata do not belong to the same synset. It is important to
> know when word senses belong to the same synset, because this way synonym
> words __in the SemCor collection__ can be identified. The only way to know
> this, apart of consulting WordNet itself, is having unique synset
> identifiers in SemCor. Is the information in Semcor annotations enough to
> get that unique identification? How can we do it?
The easiest (and perhaps the fastest) way to find the synset of a word
(given its lemma and lexsn), is to construct the sense key of a word
(lemma%lexsn), and look this up in the index.sense file provided with the
WordNet data files.

hope this helps,
-Rada



More information about the Corpora mailing list