[Corpora-List] New paper on learning robust embeddings in high-dimensional linguistic features: corrected link

Paul Thompson Paul.Thompson at manchester.ac.uk
Thu Aug 9 17:38:30 UTC 2012



NOTE: This email was sent earlier, but the link provided pointed to the wrong paper - this has now been corrected. Apologies for any inconvenience caused.

--------------


Discovering Robust Embeddings in (Dis)Similarity Space for High-Dimensional Lingustic Features

Tingting Mu, Makoto Miwa, Jun'ichi Tsujii and Sophia Ananiadou

Computational Intelligence 2012

http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8640.2012.00452.x/abstract


Abstract
========

Recent research has shown the effectiveness of rich feature representation for tasks in natural language
processing (NLP). However, exceedingly large number of features do not always improve classification performance.
They may contain redundant information, lead to noisy feature presentations, and also render the learning algorithms
intractable. In this paper, we propose a supervised embedding framework that modifies the relative positions between
instances to increase the compatibility between the input features and the output labels and meanwhile preserves the
local distribution of the original data in the embedded space. The proposed framework attempts to support flexible
balance between the preservation of intrinsic geometry and the enhancement of class separability for both interclass
and intraclass instances. It takes into account characteristics of linguistic features by using an inner product-based
optimization template. (Dis)similarity features, also known as empirical kernel mapping, is employed to enable
computationally tractable processing of extremely high-dimensional input, and also to handle nonlinearities in
embedding generation when necessary. Evaluated on two NLP tasks with six data sets, the proposed framework
provides better classification performance than the support vector machine without using any dimensionality
reduction technique. It also generates embeddings with better class discriminability as compared to many existing
embedding algorithms.


--------

Paul Thompson
Research Associate
School of Computer Science
National Centre for Text Mining
Manchester Institute of Biotechnology
University of Manchester
131 Princess Street
Manchester
M1 7DN
UK
Tel: 0161 306 3091
http://personalpages.manchester.ac.uk/staff/Paul.Thompson/







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120809/44d9559c/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list