[Corpora-List] BNC n-grams

Mark Davies Mark_Davies at byu.edu
Tue Nov 10 13:18:26 UTC 2009


Serge,

>> I did this quite some time ago, but I never thought of this as an achievement, since it's trivial to produce.

And I wasn't saying that it would be difficult to produce. I could generate the full 2-grams or 3-grams list from the BYU-BNC databases in about one minute. I just wanted to know whether it had already been done, and whether people would find the data useful. Based on the lack of responses, it looks like it wouldn't be all that useful.

> In case you need them, http://corpus.leeds.ac.uk/frqc/bnc-bi.gz (it's based on lemmas, but I didn't use POS tags).

16 Paz .
16 pay Yeah
16 pay we
16 pay twelve
16 , payroll

This is nice, but I think that it really does need lemmas, word form, and PoS for each bigram. A PoS search like "being VVD" or "NN* NN*" would be impossible with this bigrams list (or even "being *" (being considered, being asked), since it's only lemmas).

Anyway, it looks like the question is answered -- thanks.

Mark D.

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906

http://davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================ 



_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list