[Corpora-List] Syntactically annotated corpus of a, Spanish Dialect such as Buenos Aires or Los Angeles

"José M. García-Miguel" gallego at uvigo.es
Mon Oct 20 21:36:10 UTC 2008


You might also try the search form of ADESSE 
(http://adesse.uvigo.es/data/avanzado.php), with the following options:

Genero Textual = "Oral"
Procedencia="Hispanoamérica"

And in "Parámetros del argumento 1":

Función Sintáctica="Objeto"     (-->and also "cualquiera" / "directo" / 
"indirecto")
Categoría Sintáctica="Cualquiera (no vacío)"   or any specific category 
in the list ("FN"/ "ProPers"/ ...)
Concordancia/clítico= "Clítico objeto"  [for doubling, for ex: "va a 
buscarlo al soldadito"], or
Concordancia/clitico= "Ninguno (nulo)" [for not doubling; for ex: "se 
queda a buscar a Pedro Páramo"]

This gives you the examples of doubling / not doubling from the Buenos 
Aires part of Habla Culta corpus.
The form gives you several options to refine your search, but, for the 
moment, not all the possibilities imaginable from the original database 
are available (in the future, I hope they will be).

ADESSE is a semantically enlarged version of BDS, a syntactic database 
of 160 thousand Spanish clauses build on the Arthus corpus (1,5 million 
words). The oral part of this corpus comprises the Madrid, Sevilla, and 
Buenos Aires texts from the Norma Culta corpus. This is why in this case 
a search over "Género textual = oral" and "Procedencia=Hispanoamérica" 
is equivalent to a search over Buenos Aires.

Best,

Jose M. Garcia-Miguel
University of Vigo
E-mail: gallego at uvigo.es
Web:  webs.uvigo.es/weba575/jmgm/

>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 19 Oct 2008 07:07:06 -0600
> From: Mark Davies <Mark_Davies at byu.edu>
> Subject: Re: [Corpora-List] Syntactically annotated corpus of a
> 	Spanish Dialect such as Buenos Aires or Los Angeles
> To: "Carlos A. Gomez Gallo" <cgomez at cs.rochester.edu>,
> 	"corpora at uib.no"	<corpora at uib.no>
>
> You might try the Corpus del Espanol (www.corpusdelespanol.org).
>
> For preverbal doubling (a ellos les dijeron), you'd enter something like:
>
> a [p*] [p*] [v*]
>
> For post-verbal (decirles a ellos), try something like:
>
> [vr*+] a [p*]
>
> In both cases, it will find all the several thousand tokens in 3-4 seconds.
>
> FYI, the Corpus del Espanol is 100 million words in size, including 20 million from the 1900s. For the 1900s, it is equally balanced between spoken, fiction, newspaper, and academic, which means that you can do nice cross-genre comparisons. Since it has texts from earlier centuries as well (e.g. 20 million words from the 1800s), you can look at the historical development of the construction as well. Finally, because the spoken has the entire Habla Culta corpus, you can do nice comparisons across different dialects.
>
> Best,
>
> Mark Davies
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
> ________________________________________
> From: corpora-bounces at uib.no [corpora-bounces at uib.no] On Behalf Of Carlos A. Gomez Gallo [cgomez at cs.rochester.edu]
> Sent: Saturday, October 18, 2008 10:37 PM
> To: corpora at uib.no
> Subject: [Corpora-List] Syntactically annotated corpus of a Spanish Dialect such as Buenos Aires or Los Angeles
>
> Good Morning,
> I am starting on a project on double clitic omission in Spanish. Does
> anybody know of a syntactically annotated Spanish corpus of a Latin
> American dialect that allows double clitic and its omission? The dialects
> most studied in the literature are from Buenos Aires and Los Angeles, but
> any other will do.
> Suggestions where I can find these or anything related would be
> appreciated. If you prefer, you can write to me individually and I
> will post a summary back to the list afterwards.
>
> Many thanks,
> Carlos
>
> -- Carlos A. Gomez Gallo
> Computer Science and Linguistics Ph.D. candidate
> Email: cgomez at cs.rochester.edu
> Webpage: www.cs.rochester.edu/~cgomez
>
> Snail Mail:
> Department of Computer Science
> 734 Computer Studies Building
> University of Rochester
> Rochester, NY 14627
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
>
>   

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list