[Corpora-List] phd position on semantic spaces from text and images, clic (university of trento)

Mon Mar 29 14:26:13 UTC 2010

(Apologies for multiple postings)

PHD POSITION ON MULTIMODAL SEMANTIC SPACES AVAILABLE

One PhD position/studentship to study integrated text-vision semantic 
spaces is available in the Language, Interaction and Computation track 
of the 3-year PhD program offered by the Center for Mind/Brain Sciences 
at the University of Trento (Italy):

http://www.cimec.unitn.it/

The PhD program (start date: November 2010) is taught in English by an 
international faculty. The Language, Interaction & Computation track is 
organized by CLIC, an interdisciplinary group of researchers studying 
verbal and non-verbal communication using both computational and 
cognitive methods:

http://clic.cimec.unitn.it/

CLIC is part of the larger network of research labs focusing on Natural 
Language Processing and related domains in the Trento region, that is 
quickly becoming one of the areas with the highest concentration of NLP 
researchers in Europe.

The studentship is sponsored by a Google Research Award, and the PhD 
project will be carried out as a collaboration between CLIC members and 
the Zurich Google Research team.

* Project Outline *

The automated measurement of semantic similarity (similarity in meaning) 
between words/concepts through unsupervised statistical semantic space 
models such as Latent Semantic Analysis or Topic Models has been a 
success story in text mining (see Turney and Pantel, 2010, for a recent 
survey).

Today, through the Web, we have access to huge amounts of documents that 
contain both text and images. While the use of text to improve image 
labeling and retrieval is an active and growing area of research (e.g, 
Feng and Lapata, 2008, Moringen, 2008, Mathe et al., 2008, Hare et al., 
2008, Olivares et al., 2008, Wang et al., 2009), in this project we want 
to go the other way around, and develop novel techniques to extract 
multimodal semantic spaces from texts and images, in order to improve 
the measurement of semantic similarity among words. On the one hand, it 
has been shown (Baroni and Lenci, 2009) that text-extracted conceptual 
descriptions are lacking exactly in those aspects (such as color, shape 
and parts of objects) that are likely to be most salient in visual 
depictions of the same objects. On the other, a recent trend in computer 
vision is to represent images as vectors that record the occurrence, in 
the analyzed image, of a discrete vocabulary of "visual words" (Yang et 
al., 2007, and references there). This development paves the way to the 
integration of visual word co-occurrence features into the classic 
text-based vectors of current semantic space models.

The topic is expected to have a strong impact both on applied front, as 
a breakthrough in the acquisition of large semantic repositories (we 
will explore in particular applications to information retrieval), and 
from a theoretical point of view, leading to "embodied" models of 
computational learning that are more directly comparable to what human 
learners do (Barsalou, 2008, Glenberg and Mehta, 2008).

* Application Information *

The successful candidate will have a strong computational background, 
including familiarity with machine learning and/or statistical methods, 
and should be familiar with the basics of either natural language 
processing or (preferably) computer vision. An interest in exploring the 
connections between artificial and natural intelligence and cognition is 
also desirable.

The official call of the Doctoral School in Cognitive and Brain Sciences 
will been announced shortly, and application details will be available 
at the page:

http://portale.unitn.it/drcimec/portalpage.do?channelId=-35529

We strongly encourage a preliminary expression of interest in the 
project. Please contact Marco Baroni (marco.baroni at unitn.it), attaching 
a CV in pdf or txt format, or a link to an online CV.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora