[Corpora-List] Job : 1 year post-doc Paris : LABEX EFL - SEDYL-CNRS - text mining for language contact corpora
Pascal Vaillant
vaillant at univ-paris13.fr
Fri Nov 4 09:29:06 UTC 2011
------------------------
Please circulate -
Dear colleagues,
We offer a 12 months postdoc position in text data mining within the
10-year LABEX project "Empirical foundations of linguistics" that
started in 2011. The position is based in Paris, at the UMR SEDYL
(CNRS-INALCO-IRD). It is linked to the strand « Typology and dynamics of
linguistic systems » of this project, and more specifically to the
research programme supervised by Isabelle Léglise: Multifactorial
Analysis of language contact & language changes(LC1)
__________________________
Postdoctoral research fellow : Text data mining applied to
heterogeneous and multilingual corpora
Keywords:
computational linguistics, data mining, high-dimension data analysis
*Application deadline: *
*2011/11/10
*
Competences
The candidate should have a PhD in computer science, and should be an
expert in the field of data mining, preferably on a linguistic field of
application (text mining, natural language processing) involving
large-dimension data/texts. The candidate should have experience of XML
format. A knowledge of TEI standards will be a plus. She must know how
to program in C language; C ++ or Java. She will use the relational
model of databases and the SQL language; knowledge of MySQL is an
advantage. An interest for linguistic diversity is a good point.
Description
This task consists in developing functions of search / data mining
applied to language contact corpora, that is to transcriptions of
non-homogeneous and mixed verbal productions collected in multilingual
areas (38 languages from all continents involved). This scenario is
traditionally little taken into account by the algorithms of
computational linguistics (grammatical inference or lexical labeling).
We expect to find correlations of certain categories, or certain
syntactical positions, with language contact or language change phenomena.
Given the large number of variables to be analyzed, with regard to the
size of the corpus (number of samples), we will need to explore
approaches in data dimensionality reduction such as "manifold learning".
Duration:
12 months, starting 1st of december 2011 or january 2012
It is a full-time position
http://www.labex-efl.org/?q=en/hiring/lc1
Salary:
24 000 EUR /year
If you are interested, please send a a CV (including a publication
list), a letter of application and the names of two referents to:
Isabelle Léglise (leglise at vjf.cnrs.fr) & Pascal Vaillant
(vaillant at vjf.cnrs.fr)
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list