28.5164, Diss: Computational Linguistics: Pranava Madhyastha: ''Exploiting Word Embeddings for Modelling Bilexical Relations''

The LINGUIST List linguist at listserv.linguistlist.org
Thu Dec 7 18:57:00 UTC 2017


LINGUIST List: Vol-28-5164. Thu Dec 07 2017. ISSN: 1069 - 4875.

Subject: 28.5164, Diss: Computational Linguistics: Pranava Madhyastha: ''Exploiting Word Embeddings for Modelling Bilexical Relations''

Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
                                   Michael Czerniakowski)
Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           http://funddrive.linguistlist.org/donate/

Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================


Date: Thu, 07 Dec 2017 13:56:51
From: Pranava Madhyastha [p.madhyastha+linguistlist at sheffield.ac.uk]
Subject: Exploiting Word Embeddings for Modelling Bilexical Relations

 
Institution: Universitat Politècnica de Catalunya 
Program: PhD in Artificial Intelligence 
Dissertation Status: Completed 
Degree Date: 2017 

Author: Pranava Madhyastha

Dissertation Title: Exploiting Word Embeddings for Modelling Bilexical
Relations. 

Dissertation URL:  http://staffwww.dcs.shef.ac.uk/people/P.Madhyastha/papers/thesis.pdf

Linguistic Field(s): Computational Linguistics


Dissertation Director(s):
Xavier Carreras
Ariadna Quattoni

Dissertation Abstract:

There has been an exponential surge of text data in the recent years. As a
consequence, unsupervised methods that make use of this data have been
steadily growing in the field of natural language processing (NLP).  Word
embeddings  are low-dimensional vectors obtained using unsupervised techniques
on the large unlabeled corpora, where words  from  the  vocabulary  are 
mapped  to  vectors  of  real  numbers. Word embeddings aim to capture
syntactic and semantic properties of words.

In NLP, many tasks involve computing the compatibility between lexical items
under some linguistic relation. We call this type of relation a bilexical
relation. Our thesis defines statistical models for bilexical relations that
centrally make use of word embeddings. Our principle  aim  is  that  the  word
 embeddings  will  favor generalization to words not seen during the training
of the model. The thesis is structured in four parts. In the first part of
this thesis, we present a bilinear model over word embeddings that leverages a
small supervised  dataset for a binary linguistic relation. Our  learning 
algorithm  exploits  low rank  bilinear  forms  and  induces  a 
low-dimensional embedding tailored for a target linguistic relation. This
results in compressed task-specific embeddings. In the second part of our
thesis, we extend our bilinear model to a ternary setting and propose a
framework for resolving prepositional phrase  attachment  ambiguity  using 
word  embeddings.  Our  models perform competitively with state-of-the-art
models. In addition, our method obtains significant improvements on
out-of-domain tests by simply using word-embeddings induced from source and
target do- mains. In the third part of this thesis, we further extend the
bilinear models for  expanding  vocabulary  in  the  context  of  statistical 
phrase-based machine translation. Our model obtains a probabilistic list of
possible translations  of  target  language  words,  given  a  word  in  the 
source language.  We  do  this  by  projecting  pre-trained  embeddings  into 
a common subspace using a log-bilinear model. We empirically notice a
significant improvement on an out-of-domain test set. In the final part of our
thesis, we propose a non-linear model that maps  initial  word  embeddings  to
 task-tuned  word  embeddings,  in the context of a neural network dependency
parser. We demonstrate its  use  for  improved  dependency  parsing, 
especially  for  sentences with  unseen  words.  We  also  show  downstream 
improvements  on  a sentiment analysis task.




------------------------------------------------------------------------------

*****************    LINGUIST List Support    *****************
Please support the LL editors and operation with a donation at:
            http://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-28-5164	
----------------------------------------------------------






More information about the LINGUIST mailing list