31.2636, FYI: PELIC - New Publicly Available Learner Corpus

The LINGUIST List linguist at listserv.linguistlist.org
Sat Aug 22 03:55:13 UTC 2020


LINGUIST List: Vol-31-2636. Fri Aug 21 2020. ISSN: 1069 - 4875.

Subject: 31.2636, FYI: PELIC - New Publicly Available Learner Corpus

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Fri, 21 Aug 2020 23:54:35
From: Ben Naismith [bnaismith at pitt.edu]
Subject: PELIC - New Publicly Available Learner Corpus

 
The ELI Data Mining Group at the University of Pittsburgh is pleased to
announce the release of the University of Pittsburgh English Language
Institute Corpus (PELIC). 

PELIC is a publicly-available 4.2-million-word learner corpus of written
texts. Collected over seven years in the University of Pittsburgh’s Intensive
English Program, these texts were produced by over 1100 students with a wide
range of linguistic backgrounds and proficiency levels. PELIC is longitudinal,
offering opportunities for tracking development in a natural classroom
setting.

Further information about PELIC and research based on these data can be found
at the PELIC homepage: https://eli-data-mining-group.github.io/Pitt-ELI-Corpus

The entire dataset is available for download at the PELIC GitHub repository,
stored in csv files: https://github.com/ELI-Data-Mining-Group/PELIC-dataset

In addition to the data, the PELIC repository contains tools for lexical
analysis (concordancing, lexical sophistication, etc.) and tutorials on how to
access and analyze the data.

Linguistic Field(s): Corpus Linguistics; Learner Corpora; Longitudinal
Corpora; Second Language Acquisition

Subject Language(s): English (eng)
 



Linguistic Field(s): Text/Corpus Linguistics

Subject Language(s): English (eng)





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-2636	
----------------------------------------------------------






More information about the LINGUIST mailing list