[Corpora-List] Similarity-based Pseudowords: Release of the dataset

Taher Pilevar taher.pilevar at gmail.com
Tue Jun 4 15:50:42 UTC 2013


We are releasing the dataset of 15,935 *Similarity-based pseudowords* that
model all the ambiguous nouns in WordNet 3.0. A pseudoword is generated for
each ambiguous noun by selecting, for each of its senses, the most suitable
monosemous representative. These pseudowords can be leveraged for creating
large-scale pseudosense-annotated datasets.

Further information and the download link are provided in the following web
page:
http://lcl.uniroma1.it/pseudowords/

This dataset is released together with the paper:

Mohammad Taher Pilehvar and Roberto Navigli. Paving the Way to a
Large-scale Pseudosense-annotated Dataset. In *Proceedings of the 2013
Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies (NAACL-HLT 2013) *,
pages 1100-1109, Atlanta, USA, June 10-12, 2013.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130604/7b172177/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list