[Lingtyp] 3-year PhD position on quantitative typology, Paris, France

Sylvain Loiseau sylvain.loiseau at univ-paris13.fr
Fri Feb 10 13:13:48 UTC 2023


[Apologies for cross-posting]

The Autogramm project (https://autogramm.github.io/en) invites applications for a 3-year PhD position starting between now and May 2023. The position is funded by ANR (Agence National de la recherche) and located in Paris, France.

The goal of the thesis is to contribute to the development of quantitative typology by participating in the construction of a database on a large number of typologically diverse languages and by focusing on the quantitative analysis of this dataset (Gerdes et al. 2021, Levshina 2022). Corpora are available for a growing number of languages, thanks in particular to corpora annotated in interlinear gloss (IGT, see for example the Pangloss collection, https://pangloss.cnrs.fr) or with the Universal Dependencies annotation scheme (UD, https://universaldependencies.org and its SUD variant, https://surfacesyntacticud.github.io/). These databases allow for corpus-based typological studies that have several advantages, such as:
- the results are based directly on primary data (corpora);
- the results are reproducible as long as the data are freely accessible;
- they allow quantitative analysis: rather than being characterized as OV or VO, a language can be said to have a given percentage of OV constructions — associated with conditioning factors (Levshina 2019, Gerdes et al. 2019, Futrell et al. 2015 ; See also https://typometrics.elizia.net/#/).

The candidate will work on one of the following topics:

- How to identify cross-linguistic regularities from a set of corpora (see for example Gerdes et al. 2021)?
- What quantitative information can be extracted from a corpus that is useful for typological characterizations? Which features require prior annotation of the data and what is the nature of the annotations needed?
- How to improve methods for dealing with issues of representativeness in typological databases (Guzmán Naranjo & Becker 2022).
- How to improve methods for dealing with the issue of the commensurability of the categories used in the description of the different languages? How can we check the consistency of the data? How to detect the presence of aberrations in some treebanks (categorization choices not conforming to the universal scheme, e.g. assignment of the subject relation in ergative languages, use of the ADJ category in languages without real adjectives, etc.)?
- How to visualize quantitative typological data (faceted, multidimensional, etc.)

The work will be conducted in the context of the ANR Autogramm project (https://autogramm.github.io/), which gathers a community of researchers in field linguistics, typology, formal linguistics and automatic language processing. A growing ecosystem of tools is developed in the project for the quantitative typological generalization, in particular with UD treebanks (see grew.match.fr, https://typometrics.elizia.net/#/ https://surfacesyntacticud.github.io …)

Applications and questions can be sent to Sylvain Kahane <sylvain at kahane.fr>

Applications should include:
- Cover letter outlining interest in the position
- Names of two referees
- Curriculum Vitae (CV) with publications
- Copy of MA degree
- University grade sheet of at least the two last years

Please share this on to any potentially interested parties!
Best regards,
Sylvain Loiseau (on behalf of the Autogramm project)



Futrell Richard, Tina Hickey, Aldrin Lee, Eunice Lim, Elena Luchkina, Edward Gibson (2015). Cross-linguistic gestures reflect typological universals: A subject-initial, verb-final bias in speakers of diverse languages, Cognition 136, 215–221.

Gerdes Kim, S. Kahane, X. Chen (2022). Rediscovering Greenberg's Word Order Universals in UD, Universal Dependencies Workshop 2019, Syntaxfest.

Gerdes Kim, Kahane, S., and Chen, X. (2021). Typometrics from implicational to quantitative universals in word order typology. Glossa, 6(1): 17.

Naranjo, M. G. and Becker, L. (2022). Statistical bias control in typology. Linguistic Typology, 26(3), 605–670.

Levshina, N. (2019). Token-based typology and word order entropy: A study based on universal dependencies. Linguistic Typology, 23(3), 533 – 572.

Levshina, N. (2022). Corpus-based typology: Applications, challenges and some solutions. Linguistic Typology, 26(1), 129-160.

-----
Sylvain Loiseau
sylvain.loiseau at univ-paris13.fr

Université Sorbonne Paris Nord
99 avenue Jean-Baptiste Clément
F-93430 Villetaneuse

Laboratoire « Langues et civilisations à tradition orale » (UMR 7107 CNRS)
Campus CNRS
7, rue Guy Môquet (bât. D)
F-94801 Villejuif Cedex
http://lacito.cnrs.fr  




More information about the Lingtyp mailing list