[Corpora-List] Job : 1 year post-doc Paris : LABEX EFL - SEDYL-CNRS - text mining for language contact corpora

Pascal Vaillant vaillant at univ-paris13.fr
Fri Nov 4 09:29:06 UTC 2011


------------------------

Please circulate -

Dear colleagues,

We offer a 12 months postdoc position in text data mining within the 
10-year LABEX project "Empirical foundations of linguistics" that 
started in 2011. The position is based in Paris, at the UMR SEDYL 
(CNRS-INALCO-IRD). It is linked to the strand « Typology and dynamics of 
linguistic systems » of this project, and more specifically to the 
research programme supervised by Isabelle Léglise: Multifactorial 
Analysis of language contact & language changes(LC1)

__________________________


  Postdoctoral research fellow : Text data mining applied to
  heterogeneous and multilingual corpora

  Keywords:
computational linguistics, data mining, high-dimension data analysis

*Application deadline: *
*2011/11/10
*
Competences
The candidate should have a PhD in computer science, and should be an 
expert in the field of data mining, preferably on a linguistic field of 
application (text mining, natural language processing) involving 
large-dimension data/texts. The candidate should have experience of XML 
format. A knowledge of TEI standards will be a plus. She must know how 
to program in C language; C ++ or Java.  She  will use the relational 
model of databases and the SQL language; knowledge of  MySQL is an 
advantage. An interest for linguistic diversity is a good point.


      Description

This task consists in developing functions of search / data mining 
applied to language contact corpora, that is to transcriptions of 
non-homogeneous and mixed verbal productions collected in multilingual 
areas (38 languages from all continents involved). This scenario is 
traditionally little taken into account by the algorithms of 
computational linguistics (grammatical inference or lexical labeling). 
We expect to find correlations of certain categories, or certain 
syntactical positions, with language contact or language change phenomena.
Given the large number of variables to be analyzed, with regard to the 
size of the corpus (number of samples), we will need to explore 
approaches in data dimensionality reduction such as "manifold learning".

Duration:
12 months, starting 1st of december 2011 or january 2012
It is a full-time position
http://www.labex-efl.org/?q=en/hiring/lc1

Salary:
24 000 EUR /year

If you are interested, please send a a CV (including a publication 
list), a letter of application and the names of two referents to:

Isabelle Léglise (leglise at vjf.cnrs.fr) & Pascal Vaillant 
(vaillant at vjf.cnrs.fr)


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list