I love the second leg of Adam's definition, which has the lovely property of defining a word by delineating (some of) the circumstances under which we use it. Wittgenstein would approve.<br><br>It also crosses my mind that it is important that a corpus be a public object. Ideally, we would want the corpus itself to be accessible to all. But when that is not possible, we want the designers of the corpus to provide and publish a precise, publicly accessible statement of the defining characteristics of the corpus. Since I have no idea what is on someone else's bookshelf, it doesn't serve my purposes, and I choose to punish it slightly by declining to call it a corpus.<div>
<br></div><div>But I do like the Brown Corpus, which is defined to be representative of 15 broad categories of writing, all first published in 1961 and all by native speakers of American English. And I also quite like the idea of the SuperBrown Corpus, which is like the Brown Corpus, except that it now contains ALL the stuff published in 1961 by native speakers of American English and falling into one of the categories. I know I can't actually have the SuperBrown corpus, I don't know how to give a precise operational definition of what the boundaries of the 15 broad categories, and I am not quite sure what "first published" or "native speaker of American English" would mean in practice, and I can't get hold of it all anyway, because some of it has been irretrievably lost. However, in this case, it really is the thought that counts. By articulating the principles that guided the creation of the corpus, Kucera and Francis opened the way to the creation of comparable corpora for other languages and other years. That is quite something...<br>
<br><div class="gmail_quote">On Wed, Oct 3, 2012 at 12:53 PM, Jernej Vicic <span dir="ltr"><<a href="mailto:jernej.vicic@upr.si" target="_blank">jernej.vicic@upr.si</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Dear Adam!<br>
<br>
Does it have to be an object of linguistics or literary research for a collection of text to be called a corpus? I would broaden the scope to any kind of research.<br>
<br>
Adam Kilgarriff wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
Yuri,<br>
<br>
a corpus is a collection of texts/speech. We call it a corpus when we view it as an object of linguistics or literary research. The answers to your questions are yes and yes.<br>
<br>
Adam<br>
<br></div>
On 2 October 2012 13:21, Yuri Tambovtsev <<a href="mailto:yutamb@mail.ru" target="_blank">yutamb@mail.ru</a> <mailto:<a href="mailto:yutamb@mail.ru" target="_blank">yutamb@mail.ru</a>>> wrote:<br>
<br>
__<div class="im"><br>
Dear corpora members, I do not understand, what corpora is and what<br>
corpora is not. Is the set the text of books by Charles Dickens is a<br>
Dickens corpora? What about the books of Ernst Hemingway and other<br>
writers? Looking forward to hearing your opinion to <a href="mailto:yutamb@mail.ru" target="_blank">yutamb@mail.ru</a><br></div>
<mailto:<a href="mailto:yutamb@mail.ru" target="_blank">yutamb@mail.ru</a>> Yours sincerely Yuri Tambovtsev,<div class="im"><br>
Novosibirsk, Russia<br>
<br>
______________________________<u></u>_________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>
Corpora mailing list<br></div>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a> <mailto:<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a>><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>
<br>
<br>
<br><span class="HOEnZb"><font color="#888888">
<br>
-- <br>
==============================<u></u>==========<br>
Adam Kilgarriff <<a href="http://www.kilgarriff.co.uk/" target="_blank">http://www.kilgarriff.co.uk/</a>> <a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.com</a> <mailto:<a href="mailto:adam@lexmasterclass.com" target="_blank">adam@lexmasterclass.<u></u>com</a>> Director Lexical Computing Ltd <<a href="http://www.sketchengine.co.uk/" target="_blank">http://www.sketchengine.co.<u></u>uk/</a>> Visiting Research Fellow University of Leeds <<a href="http://leeds.ac.uk" target="_blank">http://leeds.ac.uk</a>> /Corpora for all/ with the Sketch Engine <<a href="http://www.sketchengine.co.uk" target="_blank">http://www.sketchengine.co.uk</a><u></u>> /DANTE: a lexical database for English <<a href="http://www.webdante.com" target="_blank">http://www.webdante.com</a>> /<br>
==============================<u></u>==========<br>
<br>
<br>
------------------------------<u></u>------------------------------<u></u>------------</font></span><div class="im"><br>
<br>
______________________________<u></u>_________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>
</div></blockquote><div class="HOEnZb"><div class="h5">
<br>
______________________________<u></u>_________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Chris Brew, Educational Testing Service<br>
</div>