[Corpora-List] Syntactically annotated corpus of a Spanish Dialect such as Buenos Aires or Los Angeles

Mark Davies Mark_Davies at byu.edu
Sun Oct 19 13:07:06 UTC 2008


You might try the Corpus del Espanol (www.corpusdelespanol.org).

For preverbal doubling (a ellos les dijeron), you'd enter something like:

a [p*] [p*] [v*]

For post-verbal (decirles a ellos), try something like:

[vr*+] a [p*]

In both cases, it will find all the several thousand tokens in 3-4 seconds.

FYI, the Corpus del Espanol is 100 million words in size, including 20 million from the 1900s. For the 1900s, it is equally balanced between spoken, fiction, newspaper, and academic, which means that you can do nice cross-genre comparisons. Since it has texts from earlier centuries as well (e.g. 20 million words from the 1800s), you can look at the historical development of the construction as well. Finally, because the spoken has the entire Habla Culta corpus, you can do nice comparisons across different dialects.

Best,

Mark Davies

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
________________________________________
From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Carlos A. Gomez Gallo [cgomez at cs.rochester.edu]
Sent: Saturday, October 18, 2008 10:37 PM
To: corpora at uib.no
Subject: [Corpora-List] Syntactically annotated corpus of a Spanish Dialect such as Buenos Aires or Los Angeles

Good Morning,
I am starting on a project on double clitic omission in Spanish. Does
anybody know of a syntactically annotated Spanish corpus of a Latin
American dialect that allows double clitic and its omission? The dialects
most studied in the literature are from Buenos Aires and Los Angeles, but
any other will do.
Suggestions where I can find these or anything related would be
appreciated. If you prefer, you can write to me individually and I
will post a summary back to the list afterwards.

Many thanks,
Carlos

-- Carlos A. Gomez Gallo
Computer Science and Linguistics Ph.D. candidate
Email: cgomez at cs.rochester.edu
Webpage: www.cs.rochester.edu/~cgomez

Snail Mail:
Department of Computer Science
734 Computer Studies Building
University of Rochester
Rochester, NY 14627

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list