31.2879, FYI: The ESLORA Corpus for the Study of Spoken Spanish – Release 2.0

The LINGUIST List linguist at listserv.linguistlist.org
Wed Sep 23 20:20:19 UTC 2020


LINGUIST List: Vol-31-2879. Wed Sep 23 2020. ISSN: 1069 - 4875.

Subject: 31.2879, FYI: The ESLORA Corpus for the Study of Spoken Spanish – Release 2.0

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Wed, 23 Sep 2020 16:19:11
From: Victoria Vázquez [victoria.vazquez at usc.es]
Subject: The ESLORA Corpus for the Study of Spoken Spanish – Release 2.0

 
The Spanish Grammar Research Group at the University of Santiago de Compostela
is pleased to announce the release of version 2.0 of the ESLORA Corpus for the
Study of Spoken Spanish.

Website: http://eslora.usc.es 

ESLORA 2.0 (September 2020) comprises 83 documents including 768,005
orthographic words (or 898,914 grammatical tokens).The corpus consists of
spontaneous conversations and semi-structured interviews recorded in Galicia
between 2007 and 2015, which were orthographically transcribed and manually
aligned to the audio files. The transcripts have been morphologically tagged
and lemmatized with the statistical POS tagger XIADA:
https://github.com/crpih/xiada.  

The search facility allows queries using orthographic, lexical and
morphosyntactic information in combination with other phenomena included in
the transcription files (word lengthening, fragmentation of words, laughs,
quotes, etc.), as well as with social variables (age, gender, education, role
of the speaker).

The new versión 2.0 includes more conversation transcripts and extends the
query engine options for retrieval, exploitation and downloading of the data.

Main new features:
- Combination of Boolean operators with existing options (orthographic words,
lemmata, morphosyntactic tags).
- Direct retrieval of lists and frequencies of grammatical elements or lemmata
that meet the criteria you have set.
- A lexical and grammatical frequency dictionary to be built dynamically .

The multiple functions of the search engine are fully described in the User
Guide: http://eslora.usc.es/guide_description

The ESLORA corpus has been compiled by the Spanish Grammar Research Group at
the University of Santiago de Compostela. Version 2.0 has been developed as
part of the research project ESLORA+ (Ref. FFI2017-86379-P. AEI/FEDER, UE).
 



Linguistic Field(s): Text/Corpus Linguistics

Subject Language(s): Spanish (spa)

Language Family(ies): Latin Subgroup





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-2879	
----------------------------------------------------------






More information about the LINGUIST mailing list