[Corpora-List] Do we still need language corpora?
Mark Davies
Mark_Davies at byu.edu
Fri Feb 4 14:06:35 UTC 2011
Martin,
I would imagine that one motivation for the question is the availability of "corpora" like Google/Web and Google Books. Of course, one needs to distinguish between:
corpus = textual corpus (i.e. words and sentences + metadata)
and
corpus = textual corpus + architecture and interface for accessing the information
Many wonderful textual corpora are "trapped" inside an architecture and interface that don't allow users to do much with them. As everyone dealing with "Web as Corpus" knows, effectively and efficiently using Web/Google/Books data -- especially via the native Google interface -- is a real challenge.
Two pages that might be relevant:
http://corpus.byu.edu/coha/compare-googleBooks.asp
http://corpus.byu.edu/coca/compare-google.asp
Best,
Mark D.
============================================
Mark Davies
Professor of (Corpus) Linguistics
Brigham Young University
(phone) 801-422-9168 / (fax) 801-422-0906
Web: http://davies-linguistics.byu.edu
** Corpus design and use // Linguistic databases **
** Historical linguistics // Language variation **
** English, Spanish, and Portuguese **
============================================
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list