17.2660, FYI: Call for Collaboration: Latin Treebank

linguist at LINGUISTLIST.ORG linguist at LINGUISTLIST.ORG
Mon Sep 18 22:16:18 UTC 2006


LINGUIST List: Vol-17-2660. Mon Sep 18 2006. ISSN: 1068 - 4875.

Subject: 17.2660, FYI: Call for Collaboration: Latin Treebank

Moderators: Anthony Aristar, Eastern Michigan U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>
 
Reviews: Laura Welcher, Rosetta Project / Long Now Foundation  
         <reviews at linguistlist.org> 

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Hunter Lockwood <hunter at linguistlist.org>
================================================================  

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================  

1)
Date: 18-Sep-2006
From: David Bamman < David.Bamman at tufts.edu >
Subject: Call for Collaboration: Latin Treebank 

	
-------------------------Message 1 ---------------------------------- 
Date: Mon, 18 Sep 2006 18:15:08
From: David Bamman < David.Bamman at tufts.edu >
Subject: Call for Collaboration: Latin Treebank 
 


Call for Collaboration: Latin Treebank

The Perseus Project has recently received a planning grant from the NSF to
investigate the costs and labor involved in constructing a
multimillion-word Latin treebank (a large collection of syntactically
parsed sentences), along with its potential value for the linguistics and
Classics community. While our initial efforts under this grant will focus
on syntactically annotating excerpts from Golden Age authors (Caesar,
Cicero, Vergil) and the Vulgate, a future multimillion-word corpus would be
comprised of writings from the pre-Classical period up through the Early
Modern era. To date we've annotated a total of 12,000 words in a style
that's predominantly informed by two sources: the dependency grammar used
by the Prague Dependency Treebank (itself based on Mel'cuk 1988), and the
Latin grammar of Pinkster 1990.

While treebanks provide valuable training data for computational tasks such
as grammar induction and automatic syntactic parsing, they also have the
potential to be used in traditional research areas as well. Large
collections of syntactically parsed sentences have the potential to
revolutionize lexicography and philology, as they provide the immediate
context for a word's use along with its typical syntactic arguments (this
lets us chart, for example, how the meaning of a verb changes as its
predominant arguments change). Treebanks enable large-scale research into
structurally-based rhetorical devices particularly of interest to
Classicists (such as hyperbaton) and they provide the raw data for research
in historical linguistics (such as the move in Latin from classical SOV
word order to romance SVO).

The eventual Latin treebank will be openly available to the public; we
should, therefore, come to a consensus on how it should be built. To that
end we encourage input from the linguistics and Classics community on the
treebank design (including the syntactic representation of Latin) and
welcome contributions by annotators (for which limited funding is
available). Interested collaborators should contact David Bamman
(David.Bamman at tufts.edu) at the Perseus Project. 



Linguistic Field(s): Historical Linguistics
                     Syntax
                     Text/Corpus Linguistics





 




-----------------------------------------------------------
LINGUIST List: Vol-17-2660	

	



More information about the LINGUIST mailing list