28.5164, Diss: Computational Linguistics: Pranava Madhyastha: ''Exploiting Word Embeddings for Modelling Bilexical Relations''
The LINGUIST List
linguist at listserv.linguistlist.org
Thu Dec 7 18:57:00 UTC 2017
LINGUIST List: Vol-28-5164. Thu Dec 07 2017. ISSN: 1069 - 4875.
Subject: 28.5164, Diss: Computational Linguistics: Pranava Madhyastha: ''Exploiting Word Embeddings for Modelling Bilexical Relations''
Moderators: linguist at linguistlist.org (Damir Cavar, Malgorzata E. Cavar)
Reviews: reviews at linguistlist.org (Helen Aristar-Dry, Robert Coté,
Michael Czerniakowski)
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
Editor for this issue: Sarah Robinson <srobinson at linguistlist.org>
================================================================
Date: Thu, 07 Dec 2017 13:56:51
From: Pranava Madhyastha [p.madhyastha+linguistlist at sheffield.ac.uk]
Subject: Exploiting Word Embeddings for Modelling Bilexical Relations
Institution: Universitat Politècnica de Catalunya
Program: PhD in Artificial Intelligence
Dissertation Status: Completed
Degree Date: 2017
Author: Pranava Madhyastha
Dissertation Title: Exploiting Word Embeddings for Modelling Bilexical
Relations.
Dissertation URL: http://staffwww.dcs.shef.ac.uk/people/P.Madhyastha/papers/thesis.pdf
Linguistic Field(s): Computational Linguistics
Dissertation Director(s):
Xavier Carreras
Ariadna Quattoni
Dissertation Abstract:
There has been an exponential surge of text data in the recent years. As a
consequence, unsupervised methods that make use of this data have been
steadily growing in the field of natural language processing (NLP). Word
embeddings are low-dimensional vectors obtained using unsupervised techniques
on the large unlabeled corpora, where words from the vocabulary are
mapped to vectors of real numbers. Word embeddings aim to capture
syntactic and semantic properties of words.
In NLP, many tasks involve computing the compatibility between lexical items
under some linguistic relation. We call this type of relation a bilexical
relation. Our thesis defines statistical models for bilexical relations that
centrally make use of word embeddings. Our principle aim is that the word
embeddings will favor generalization to words not seen during the training
of the model. The thesis is structured in four parts. In the first part of
this thesis, we present a bilinear model over word embeddings that leverages a
small supervised dataset for a binary linguistic relation. Our learning
algorithm exploits low rank bilinear forms and induces a
low-dimensional embedding tailored for a target linguistic relation. This
results in compressed task-specific embeddings. In the second part of our
thesis, we extend our bilinear model to a ternary setting and propose a
framework for resolving prepositional phrase attachment ambiguity using
word embeddings. Our models perform competitively with state-of-the-art
models. In addition, our method obtains significant improvements on
out-of-domain tests by simply using word-embeddings induced from source and
target do- mains. In the third part of this thesis, we further extend the
bilinear models for expanding vocabulary in the context of statistical
phrase-based machine translation. Our model obtains a probabilistic list of
possible translations of target language words, given a word in the
source language. We do this by projecting pre-trained embeddings into
a common subspace using a log-bilinear model. We empirically notice a
significant improvement on an out-of-domain test set. In the final part of our
thesis, we propose a non-linear model that maps initial word embeddings to
task-tuned word embeddings, in the context of a neural network dependency
parser. We demonstrate its use for improved dependency parsing,
especially for sentences with unseen words. We also show downstream
improvements on a sentiment analysis task.
------------------------------------------------------------------------------
***************** LINGUIST List Support *****************
Please support the LL editors and operation with a donation at:
http://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-28-5164
----------------------------------------------------------
More information about the LINGUIST
mailing list