[Corpora-List] jobs in Cambridge: NLP for eScience

Ann Copestake Ann.Copestake at cl.cam.ac.uk
Wed May 25 13:02:24 UTC 2005


RESEARCH ASSOCIATE (TWO POSTS)

Ref No: NR115 & 116 
Salary: £19,460 - £29,127 pa 
Limit of tenure: Up to forty-eight months

Applications are invited for two Research Associates to develop natural 
language processing technology for eScience. The project aims to develop a 
natural-language oriented markup language which enables the tight integration 
of partial information from a wide variety of language processing tools. This 
language will be compatible with GRID and Web protocols and will have a sound 
logical basis consistent with Semantic Web standards. This will be used for 
robust and extensible extraction of information from scientific texts and to 
model scientific argumentation and citation purpose in order to support novel 
modes of information access. We will demonstrate the applicability of this 
infrastructure on Chemistry texts.

The project is a collaboration between the NLIP group in the Computer 
Laboratory (A. Copestake, S. Teufel: http://www.cl.cam.ac.uk/Research/NL/),the 
Unilever Centre for Molecular Informatics in the Department of Chemistry (P. 
Murray-Rust: http://www-ucc.ch.cam.ac.uk/) and the Cambridge eScience Centre 
(A. Parker: http://www.escience.cam.ac.uk/).

This project will build on existing technology for the analysis of natural 
language text and for representation of data in Chemistry texts. Proposed 
start date: 1 October 2005 or as soon as possible thereafter.

Post 1 (Computer Laboratory):
Research in combining deep and shallow processing techniques, parsing of 
Chemistry texts, discourse analysis, word sense disambiguation and anaphora 
resolution with respect to ontologies. Coordination with Chemistry on 
application of technology to Chemistry texts and with the eScience Centre on 
development of high throughput techniques.

A PhD or equivalent experience in computational linguistics/ natural language 
engineering is required. Relevant topics include computational semantics, 
anaphora resolution, word sense disambiguation, shallow/deep parsing, 
information extraction and ontology extraction. However, broad interests and 
demonstrated ability to apply theoretical research with large corpora will be 
an advantage.

As the research will build on an extensive existing code base mostly 
implemented in C or Common Lisp, strong programming skills in these languages 
are essential (also relevant are perl and C++), in a Unix/ Linux environment. 
Knowledge of internet technologies would be an advantage.

Post 2 (Chemistry):
Research in chemical ontology based on XML, RDF and Semantic Web technology. 
recognition and parsing of chemical terms, user interface design, coordination 
with Computer Laboratory, and interaction with chemical publishers.

A PhD or equivalent experience in chemical informatics, computational 
chemistry or equivalent. Experience in some of the following in chemistry: 
searching, GUIs, datamining, high-throughput computation and the GRID. 
Programming experience in a modern language (Java, C++) essential.

For further information/job description, please e-mail 
Simone.Teufel at cl.cam.ac.uk, or visit http://www.cl.cam.ac.uk/DeptInfo/Jobs/

Applicants should send a cover letter, a completed PD18 form  
(http://www.admin.cam.ac.uk/offices/personnel/forms/pd18/), a full CV, the 
names and addresses of three academic/professional referees to Simone Teufel, 
Computer Laboratory, JJ Thomson Avenue, Cambridge, CB3 0FD. Closing date: 20 
June, 2005. Interview date: week of 4-8 July 2005.



More information about the Corpora mailing list