24.915, Calls: Typology, Text/Corpus Ling, Computational Ling/Germany

Wed Feb 20 16:58:32 UTC 2013

LINGUIST List: Vol-24-915. Wed Feb 20 2013. ISSN: 1069 - 4875.

Subject: 24.915, Calls: Typology, Text/Corpus Ling, Computational Ling/Germany

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews: Veronika Drake, U of Wisconsin Madison
Monica Macaulay, U of Wisconsin Madison
Rajiv Rao, U of Wisconsin Madison
Joseph Salmons, U of Wisconsin Madison
Anja Wanner, U of Wisconsin Madison
       <reviews at linguistlist.org>

Homepage: http://linguistlist.org

Do you want to donate to LINGUIST without spending an extra penny? Bookmark
the Amazon link for your country below; then use it whenever you buy from
Amazon!

USA: http://www.amazon.com/?_encoding=UTF8&tag=linguistlist-20
Britain: http://www.amazon.co.uk/?_encoding=UTF8&tag=linguistlist-21
Germany: http://www.amazon.de/?_encoding=UTF8&tag=linguistlistd-21
Japan: http://www.amazon.co.jp/?_encoding=UTF8&tag=linguistlist-22
Canada: http://www.amazon.ca/?_encoding=UTF8&tag=linguistlistc-20
France: http://www.amazon.fr/?_encoding=UTF8&tag=linguistlistf-21

For more information on the LINGUIST Amazon store please visit our
FAQ at http://linguistlist.org/amazon-faq.cfm.

Editor for this issue: Alison Zaharee <alison at linguistlist.org>
================================================================  

Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
          http://multitree.linguistlist.org/

Date: Wed, 20 Feb 2013 11:58:07
From: Thomas Mayer [thomas.mayer at uni-marburg.de]
Subject: Workshop on Corpus-based Quantitative Typology

E-mail this message to a friend:
http://linguistlist.org/issues/emailmessage/verification.cfm?iss=24-915.html&submissionid=8632841&topicid=3&msgnumber=1

Full Title: Workshop on Corpus-based Quantitative Typology 
Short Title: CoQuaT 2013 

Date: 14-Aug-2013 - 14-Aug-2013
Location: Leipzig, Germany 
Contact Person: Thomas Mayer
Meeting Email: coquat2013 at gmail.com
Web Site: http://paralleltext.info/coquat2013/ 

Linguistic Field(s): Computational Linguistics; Text/Corpus Linguistics; Typology 

Call Deadline: 31-Mar-2013 

Meeting Description:

Convenors:

Michael Cysouw (Philipps University of Marburg)
Dirk Goldhahn (University of Leipzig)
Thomas Mayer (Philipps University of Marburg)
Uwe Quasthoff (University of Leipzig)

Invited Speakers:

Östen Dahl (Stockholm University)
Kevin Scannell (Saint Louis University)

Workshop Description:

The amount of available (textual) corpora of the world's languages is currently rising at an incredible rate. The aim of this workshop is to bring together researchers dealing with corpus-based quantitative language comparison and to encourage typological studies that rely on corpus data. 

A growing body of research uses corpora to investigate the structure of individual languages. There also exists a large amount of research on the world-wide linguistic diversity, though mostly on the basis of information manually extracted from published sources. In contrast, the combination of the two is still rare. There are only few quantitative typological investigations with a world-wide scope that use corpora to infer cross-linguistic generalizations and insights. Some previous work compiled quantitative data through manual corpus annotation (e.g. Greenberg 1960; Wälchli 2005) or automatically with the help of computer programs (e.g. Mayer and Cysouw 2012). In addition, there is some relevant work using corpora to compare a smaller number of (genealogically related) languages (e.g. Bickel 2003; van der Auwera 2005).

Cross-linguistic corpora, in particular (massively) parallel corpora (cf. Cysouw and Wälchli 2007) or comparable corpora compiled through web crawling (e.g. Scannell 2007; Goldhahn et al. 2012), provide an enormous amount of information about the world's languages. Although such data is often not ideal from a linguistic point of view (involving problems of translationese, or being restricted to special textual genres), it would be a waste not at least to try to use them for comparative linguistic purposes. 

One of the reasons for the shortage of quantitative cross-linguistic work is the lack of adequate resources for a representative sample of languages. Consequently, on top of the laborious manual analysis, typologically interested researchers are faced with the time-consuming task to build their own corpora from scratch. One goal of this workshop is therefore to collect (online) resources (especially for lesser studied languages) and to exchange experience with crawling texts from the web. Furthermore, we intend to discuss in which formats cross-linguistic corpora should be made publicly available so that typologists can best benefit from them without violating copyright laws.

2nd Call for Papers: 

For this workshop, we welcome any type of cross-linguistic quantitative corpus-based work. We are interested both in the collection and preparation of (massively) cross-linguistic corpora and in investigations that rely on such a resource for language comparison. 

A) Possible topics concerning the collection and preparation of text data for a larger number of languages: 

- Presentations about projects collecting and organizing (massively) parallel or comparable corpora 
- Presentations about projects crawling web data to build a cross-linguistic corpus 
- Approaches to (semi-)automatic annotation of corpora for typological research 
- Proposals of corpus formats that are useful for typological research and can easily be imported into standard formats 

B) Specific examples of corpus-based language comparison, focusing on a particular linguistic topic of choice, using approaches like: 

- (Massively) parallel text analysis 
- Corpus-based multivariate quantitative comparison of languages 
- Unsupervised or semi-supervised language analysis for language comparison 
- Evaluation of cross-linguistic corpus-based studies 

Submission Procedure: 

Please send an abstract of approximately 500 words (excluding references) to coquat2013 at gmail.com. Abstracts should contain the author’s name, affiliation and contact email. The deadline for the submission of proposals is March 31, 2013. Notification of acceptance is May 1, 2013. 

References: 

Bickel, B. 2003. Referential density in discourse and syntactic typology. Language 79. 708-739. 

Cysouw, M. and B. Wälchli. (eds.), 2007. Parallel Texts. Using Translational Equivalents in Linguistic Typology. Theme issue in Sprachtypologie & Universalienforschung STUF 60.2. 

Goldhahn, D., T. Eckart and U. Quasthoff. 2012. Building large monolingual dictionaries at the Leipzig Corpora collection: From 100 to 200 languages. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), 23-25. 

Greenberg, J. H. 1960. A quantitative approach to the morphological typology of language. International Journal of American Linguistics 26. 178-194. 

Mayer, T. and M. Cysouw. 2012. Language comparison through sparse multilingual word alignment. In Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH. 54-62. 

Scannell, K. P. 2007. The Crúbadán Project: Corpus building for under-resourced languages. In C. Fairon, H. Naets, A. Kilgarriff, and G-M. de Schryver (eds.), Building and exploring web corpora: proceedings of the 3rd Web as Corpus Workshop, Cahiers du Central: 4, 5-15. Louvain: Presses Universitaires de Louvain. 

van der Auwera, J., E. Schalley and J. Nuyts, 2005. Epistemic possibility in a Slavonic parallel corpus - a pilot study. In B. Hansen and P. Karlik (eds.), Modality in Slavonic Languages, New Perspectives, München: Sagner. 201-17. 

Wälchli, B. 2005. Co-compounds and Natural Coordination. Oxford: Oxford University Press.

----------------------------------------------------------
LINGUIST List: Vol-24-915	
----------------------------------------------------------
Visit LL's Multitree project for over 1000 trees dynamically generated
from scholarly hypotheses about language relationships:
          http://multitree.linguistlist.org/