[Corpora-List] PhD Studentship, University of Edinburgh

Mirella Lapata mlap at inf.ed.ac.uk
Mon Mar 21 16:30:06 UTC 2005


PhD STUDENTSHIP
School of Informatics, University of Edinburgh

The Institute of Communicating and Collaborative Systems (ICCS) within
the Division of Informatics and the Human Communication Research
Centre (HCRC) invites applications for a three-year EPSRC studentship
award to commence in September 2005. The successful applicant will
work on a project aiming to devise unsupervised models for word sense
disambiguation. A brief summary of the aims of this project is given
below.

-----------------------------------------------------------------------
Graphical Models for Word Sense Disambiguation 

The most accurate techniques for word sense disambiguation (WSD) to
date are those which are trained on text in which each word has been
manually annotated with its intended sense. A major shortcoming of
these methods, though, is that accuracy is strongly correlated with
the quantity of training data available, and this is in short supply
because its production is very labour intensive. For many words the
distribution of their senses is highly skewed and WSD systems work
best when they take the most frequent sense into account. However, the
most frequent sense of a word is often not known, particularly in
domains (subject areas) in which no text has ever been manually
annotated.

This project is concerned with developing novel algorithms for
alleviating the data requirements for large scale WSD. More
specifically the project will involve:

o Exploring the use of probabilistic graphical models for word sense
  disambiguation. Graphical models are a powerful modeling framework
  that is well-suited for characterizing and studying the interactions
  among varied information sources, thus allowing to represent
  concurrently many aspects of the WSD problem.
 
o devising sense ranking models for structured (e.g., WordNet) and
  unstructured (e.g., dictionary definitions) sense inventories.

o Demonstrate the benefit of unsupervised WSD in application to
  Question Answering.

------------------------------------------------------------------------
 The EPSRC baseline rate of maintenance is currently approx. £12.000
 and the studentship will also pay the three years' tuition fees at
 home/EU rates. Applicants should have a good honours degree or
 equivalent in Computer Science or Computational
 Linguistics. Programming skills, preferably in Perl, Java, C or C++,
 are essential. Familiarity with statistical NLP, machine learning
 methods and corpus processing is an advantage.

------------------------------------------------------------------------
The project will be conducted in collaboration with the Natural
Language and Computational Linguistics (NLCL) group at the University
of Sussex (see http://www.informatics.susx.ac.uk/research/nlp/).  ICCS
and HCRC have close research links with a number of other academic
institutions (e.g., Saarland University, DFKI, Stanford University)
and companies from which the student will benefit.

------------------------------------------------------------------------
For further information about the project please e-mail Dr. Mirella
Lapata (mlap at inf.ed.ac.uk). Application forms and details of how to
apply are on-line at
http://www.informatics.ed.ac.uk/prospectus/graduate/research.html.
PLEASE MARK "Graphical Models for Word Sense Disambiguation" ON THE
APPLICATION. 

------------------------------------------------------------------------
Application deadline: Monday May 2nd 2005.
Applications received after this deadline may be considered, but this
cannot be guaranteed.



More information about the Corpora mailing list