31.1626, FYI: Coronavirus Corpus

The LINGUIST List linguist at listserv.linguistlist.org
Fri May 15 15:04:52 UTC 2020


LINGUIST List: Vol-31-1626. Fri May 15 2020. ISSN: 1069 - 4875.

Subject: 31.1626, FYI: Coronavirus Corpus

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Lauren Perkins, Nils Hjortnaes, Yiwen Zhang, Joshua Sims
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================


Date: Fri, 15 May 2020 11:04:29
From: Mark Davies [mark_davies at byu.edu]
Subject: Coronavirus Corpus

 
We are please to announce the release of the Coronavirus Corpus:

https://www.english-corpora.org/corona/

The Coronavirus Corpus is designed to be the definitive record of the social,
cultural, and economic impact of the coronavirus (COVID-19) in 2020 and
beyond, and it is part of the English-Corpora.org suite of corpora, which
offer unparalleled insight into genre-based, historical, and dialectal
variation in English.

The corpus is currently about 270 million words in size, and it continues to
grow by 3-4 million words each day. (For example, there are already 4 million
words of text for yesterday, May 14). At this rate, the corpus may be 500-600
million words in size by August 2020.

The Coronavirus Corpus allows you to see the frequency of words and phrases in
10-day increments (and even day by day, if desired) since Jan 2020, such as
social distancing, flatten the curve, WORK * home, Zoom, Wuhan, hoard*, toilet
paper, curbside, pandemic, reopen, defy.

You can also look at collocates, to see what is being said about a certain
topic, such as (verbs near) virus, or any word near ban (v), stockpile,
disinfect*, or remotely. And you can even see and compare the collocates of a
word in 10-day periods since Jan 2020.

As is common with most online corpora, the Coronavirus Corpus allows you to
see re-sortable, PoS-colored Keyword in Context (KWIC) / concordance views,
for any word or phrase.

You can also compare between different time periods, to see how our view of
things have changed over time. A few examples might be: phrases with social *
or economic * that were more common in Jan/Feb than in Apr/May, words near BAN
or OBEY that were more common in Apr-May than in Jan-Feb, or all nouns that
were much more common in late April 2020 than in March 2020.

The corpus allows you to compare across the 20 countries in the corpus (US,
UK, Australia, India, etc), to see what is being said about the coronavirus in
each of these countries. You can also quickly and easily create ''Virtual
Corpora'' for particular topics, based on keywords in the text, country, date,
publication source, and more.

Finally, full-text data from the corpus will soon be available on a
''subscription'' basis, where you can download nearly all of the new data
every day, week, or month -- just as with the other corpora from
English-Corpora.org (see https://www.corpusdata.org).

We hope that the corpus will be of use to you in your research and teaching.

Mark Davies
English-Corpora.org
 



Linguistic Field(s): Computational Linguistics
                     Lexicography
                     Text/Corpus Linguistics

Subject Language(s): English (eng)





 



------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-1626	
----------------------------------------------------------






More information about the LINGUIST mailing list