33.2653, Software: Large corpus of German YouTube language now available
The LINGUIST List
linguist at listserv.linguistlist.org
Wed Aug 31 22:14:50 UTC 2022
LINGUIST List: Vol-33-2653. Wed Aug 31 2022. ISSN: 1069 - 4875.
Subject: 33.2653, Software: Large corpus of German YouTube language now available
Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Billy Dickson
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Everett Green, Sarah Goldfinch, Nils Hjortnaes,
Joshua Sims, Billy Dickson, Amalia Robinson, Matthew Fort
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org
Homepage: http://linguistlist.org
Hosted by Indiana University
Please support the LL editors and operation with a donation at:
https://funddrive.linguistlist.org/donate/
Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================
Date: Wed, 31 Aug 2022 22:12:48
From: Louis Cotgrove [cotgrove at ids-mannheim.de]
Subject: Large corpus of German YouTube language now available
Dear Linguist-Listers,
The Nottingham Corpus of German YouTube Language (Nottinghamer Korpus
Deutscher YouTube-Sprache or NottDeuYTSch) is now available for analysis in a
variety of formats, including tsv, R object, JSON, SketchEngine and
CorpusExplorer.
https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-4806
The NottDeuYTSch corpus contains over 33 million words taken from
approximately 3 million YouTube comments from videos published between 2008 to
2018 targeted at a young, German-speaking demographic and represents an
authentic language snapshot of young German speakers. The corpus was
proportionally sampled based on video category and year from a database of 112
popular German-speaking YouTube channels in the DACH region for optimal
representativeness and balance and contains a considerable amount of
associated metadata for each comment that enable further longitudinal
cross-sectional analyses.
If you have any questions or queries about the corpus, please feel free to
email me at cotgrove at ids-mannheim.de
Kind Regards
Louis Cotgrove
Abteilung: Lexik
Leibniz-Institut für Deutsche Sprache
R5, 6-13
D-68161 Mannheim
Linguistic Field(s): Text/Corpus Linguistics
Subject Language(s): German (deu)
------------------------------------------------------------------------------
*************************** LINGUIST List Support ***************************
The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
to find out how to donate and check how your university, country or discipline
ranks in the fund drive challenges. Or go directly to the donation site:
https://crowdfunding.iu.edu/the-linguist-list
Let's make this a short fund drive!
Please feel free to share the link to our campaign:
https://funddrive.linguistlist.org/donate/
----------------------------------------------------------
LINGUIST List: Vol-33-2653
----------------------------------------------------------
More information about the LINGUIST
mailing list