[Corpora-List] Do we still need language corpora?

Mark Davies Mark_Davies at byu.edu
Fri Feb 4 14:06:35 UTC 2011


Martin,

I would imagine that one motivation for the question is the availability of "corpora" like Google/Web and Google Books. Of course, one needs to distinguish between:

corpus = textual corpus (i.e. words and sentences + metadata)
and
corpus = textual corpus + architecture and interface for accessing the information

Many wonderful textual corpora are "trapped" inside an architecture and interface that don't allow users to do much with them. As everyone dealing with "Web as Corpus" knows, effectively and efficiently using Web/Google/Books data -- especially via the native Google interface -- is a real challenge.

Two pages that might be relevant:

http://corpus.byu.edu/coha/compare-googleBooks.asp

http://corpus.byu.edu/coca/compare-google.asp

Best,

Mark D.

============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: http://davies-linguistics.byu.edu

** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list