[Corpora-List] "Tajweed" in English dictionaries and corpora

Mark Davies Mark_Davies at byu.edu
Fri Mar 1 23:10:17 UTC 2013


Eric Atwell wrote:

>> Michael, you said "Thanks for 'tajweed', which corpus data suggests we should include" - what corpus data?  Presumably not the BNC.

The upcoming two billion word corpus of GLObal Web-Based English (GloWebE -- available in May 2013), has about 300 tokens for "tajweed":

http://corpus2.byu.edu/web/?c=web&q=21412274

Not surprisingly, it is the most common (normalized frequency) in Pakistan, then Bangladesh, then Tanzania, and then the UK. Both the UK and US components of GloWbE have about 400 million words, but the UK has 101 tokens of "tajweed", while the US only has 1 token.

One of the advantages of GloWebE (when it is available -- soon) will be the ability to easily compare frequency across twenty different countries, as we see in the "tajweed" example.

Mark D.

============================================
Mark Davies
Professor of Linguistics / Brigham Young University
http://davies-linguistics.byu.edu/

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list