31.2636, FYI: PELIC - New Publicly Available Learner Corpus
The LINGUIST List
linguist at listserv.linguistlist.org
Sat Aug 22 03:55:13 UTC 2020
LINGUIST List: Vol-31-2636. Fri Aug 21 2020. ISSN: 1069 - 4875.
Subject: 31.2636, FYI: PELIC - New Publicly Available Learner Corpus
Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/
Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================
Date: Fri, 21 Aug 2020 23:54:35
From: Ben Naismith [bnaismith at pitt.edu]
Subject: PELIC - New Publicly Available Learner Corpus
The ELI Data Mining Group at the University of Pittsburgh is pleased to
announce the release of the University of Pittsburgh English Language
Institute Corpus (PELIC).
PELIC is a publicly-available 4.2-million-word learner corpus of written
texts. Collected over seven years in the University of Pittsburgh’s Intensive
English Program, these texts were produced by over 1100 students with a wide
range of linguistic backgrounds and proficiency levels. PELIC is longitudinal,
offering opportunities for tracking development in a natural classroom
setting.
Further information about PELIC and research based on these data can be found
at the PELIC homepage: https://eli-data-mining-group.github.io/Pitt-ELI-Corpus
The entire dataset is available for download at the PELIC GitHub repository,
stored in csv files: https://github.com/ELI-Data-Mining-Group/PELIC-dataset
In addition to the data, the PELIC repository contains tools for lexical
analysis (concordancing, lexical sophistication, etc.) and tutorials on how to
access and analyze the data.
Linguistic Field(s): Corpus Linguistics; Learner Corpora; Longitudinal
Corpora; Second Language Acquisition
Subject Language(s): English (eng)
Linguistic Field(s): Text/Corpus Linguistics
Subject Language(s): English (eng)
------------------------------------------------------------------------------
*************************** LINGUIST List Support ***************************
The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
to find out how to donate and check how your university, country or discipline
ranks in the fund drive challenges. Or go directly to the donation site:
https://iufoundation.fundly.com/the-linguist-list-2019
Let's make this a short fund drive!
Please feel free to share the link to our campaign:
https://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-31-2636
----------------------------------------------------------
More information about the LINGUIST
mailing list