[Corpora-List] 2 PhD positions in computational linguistics at the University of Groningen

Joerg Tiedemann j.tiedemann at rug.nl
Thu Apr 24 14:06:55 UTC 2008


The University of Groningen, Faculty of Arts, Center for Language and
Cognition, announces the following two PhD positions:


PhD position in Parse and Corpus based Machine Translation
(PaCO-MT) (1,0 fte)

PhD position in Dutch Language Investigation of Summarization Technology
(DAISY) (1,0 fte)


The Center for Language and Cognition Groningen (CLCG) is a research 
institute within the Faculty of Arts of the University of Groningen. It 
embraces all the linguistic research in the faculty. A considerable 
number of the researchers participate in the Graduate School for 
Behavioral and Cognitive Neurosciences (BCN), and in the Landelijke 
Onderzoekschool Taalwetenschap (LOT). Within the CLCG there are six 
research groups: Syntax/Semantics, Discourse and Communication, Language 
Variation and Change, Computational Linguistics, Neurolinguistics, and 
Language and Literacy Development over the Life Span.



Position 1 (Vacancy 208128)

The STEVIN PaCo-MT project aims at developing an open domain hybrid MT 
system integrating proper linguistic analysis and syntactic transfer 
into a data-driven approach to be used by professional translators. 
Translation will be based on transfer (lexical and syntactic) from a 
parsed source language sentence into a corresponding target language 
structure. From this the final output is generated using information 
from a large target language Treebank that will ensure grammaticality 
and fluency. The MT application will be developed for the language pairs 
Dutch<>English and Dutch<>French. A post editing interface will be 
provided to adapt the output to user needs. The Flemish-Dutch consortium 
consists of two academic partners (Leuven and Groningen University) and 
one industrial partner (Oneliner Language and Business Solutions).

The PhD project within PaCo-MT in Groningen will be focused on building 
bilingual resources necessary for our translation approach. We will 
emphasize the use of syntactic annotation (in both, source and target 
language) in the automatic extraction of bilingual lexical data, 
(probabilistic) transfer rules and statistical translation models. 
Programming skills are required and knowledge about SMT and alignment 
techniques are definitely a plus.



Position 2 (Vacancy  208127)

The aim of STEVIN DAISY is to develop and evaluate essential technology 
for automatic summarization of Dutch informative texts. Innovative 
algorithms for topic salience detection, topic discrimination, 
rhetorical classification of content, sentence compression and text 
generation will be implemented.

An important part of the DAISY project concerns sentence generation. The 
task of the sentence generation module is to produce actual grammatical 
sentences on the basis of such abstract representations, using the 
declarative grammar of Alpino as its key knowledge source. Alpino is a 
wide-coverage grammar for Dutch, defined as a unification-based grammar, 
in which many insights from HPSG have been implemented (examples are the 
inheritance hierarchy of lexical types and grammatical rules). There has 
been a lot of work on text generation for unification grammars. More 
recent work on which we will base our approach includes Carroll and 
Oepen (2005).

Although the Alpino grammar can be used to ensure that well-formed 
sentences are produced, a further fluency module will be developed to 
ensure that the sentences that are produced are natural and appropriate. 
Just as parsing needs a (statistical) disambiguation component to select 
the appropriate parse from potentially large sets of possible parses, we 
need a fluency component to select the most appropriate sentence from 
the set of possible sentences given by the grammar.  For the fluency 
component, we propose to develop a machine-learning method similar in 
approach to the disambiguation component of the Alpino parser. The 
disambiguation component of Alpino contains a discriminative 
maximum-entropy model, trained on the Alpino treebank. For statistical 
ranking of competing surface realizations of the same content, we 
propose to implement a similar discriminative maximum-entropy model.



Requirements

a MA degree in Computational Linguistics, Computer Science or related field
knowledge of Dutch, or willingness to learn Dutch
ability to work together in a project with members from different institutes



Conditions of Employment

Employment basis: Temporary for specified period.
Duration of the contract: Four years, starting September 1, 2008.
The position requires residence in Groningen, 36 hrs/week research, and 
must result in a PhD dissertation. After the first year there will be an 
assessment of the candidates results and the progress of the project. 
Based on this, it will be decided whether the employment will be 
continued.  The University of Groningen offers a salary of EUR 2000 
gross per month in the first year to EUR 2558 gross per month in the 
fourth year.

Additional information about position 1 (PACO-MT) can be obtained from
Dr. Jörg Tiedemann, project supervisor
Tel: +31 50 3635935
Email: J.Tiedemann at rug.nl

Additional information about position 2 (DAISY) can be obtained from
Dr. Gertjan van Noord, project supervisor
Tel: +31 50 3637811
Email: G.J.M.van.Noord at rug.nl

For both positions, you can also contact:
Mrs. Wyke van der Meer, CLCG secretariat
Tel: +31 50 3635806
Email: w.a.van.der.meer at rug.nl

Additional information about the research institute can be obtained
through the following link:
http://www.rug.nl/let/onderzoek/onderzoekinstituten/clcg/index



Application procedure

You can apply for these vacancies before June 1, 2008 by sending your
application to

University of Groningen
Personnel and Organization Department
P.O. Box 72
9700 AB Groningen
The Netherlands
E-mail address: vmp at bureau.rug.nl

Please include:

your curriculum vitae
a copy of your diploma together with a list of grades
a list of publications (if any)
a recent publication or your Master's thesis
letters of two referees

Electronic applications are prefered. Please identify the vacancy number.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list