33.2653, Software: Large corpus of German YouTube language now available

The LINGUIST List linguist at listserv.linguistlist.org
Wed Aug 31 22:14:50 UTC 2022


LINGUIST List: Vol-33-2653. Wed Aug 31 2022. ISSN: 1069 - 4875.

Subject: 33.2653, Software: Large corpus of German YouTube language now available

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Billy Dickson
Managing Editor: Lauren Perkins
Team: Helen Aristar-Dry, Everett Green, Sarah Goldfinch, Nils Hjortnaes,
        Joshua Sims, Billy Dickson, Amalia Robinson, Matthew Fort
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Hosted by Indiana University

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Wed, 31 Aug 2022 22:12:48
From: Louis Cotgrove [cotgrove at ids-mannheim.de]
Subject: Large corpus of German YouTube language now available

 
Dear Linguist-Listers,

The Nottingham Corpus of German YouTube Language (Nottinghamer Korpus
Deutscher YouTube-Sprache or NottDeuYTSch) is now available for analysis in a
variety of formats, including tsv, R object, JSON, SketchEngine and
CorpusExplorer.

https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-4806

The NottDeuYTSch corpus contains over 33 million words taken from
approximately 3 million YouTube comments from videos published between 2008 to
2018 targeted at a young, German-speaking demographic and represents an
authentic language snapshot of young German speakers. The corpus was
proportionally sampled based on video category and year from a database of 112
popular German-speaking YouTube channels in the DACH region for optimal
representativeness and balance and contains a considerable amount of
associated metadata for each comment that enable further longitudinal
cross-sectional analyses.

If you have any questions or queries about the corpus, please feel free to
email me at cotgrove at ids-mannheim.de

Kind Regards

Louis Cotgrove
Abteilung: Lexik

Leibniz-Institut für Deutsche Sprache
R5, 6-13
D-68161 Mannheim


Linguistic Field(s): Text/Corpus Linguistics

Subject Language(s): German (deu)



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-33-2653	
----------------------------------------------------------





More information about the LINGUIST mailing list