<i> [Chris Brew]: "I do like the Brown Corpus, which is defined to be representative of 15 broad categories of writing, all first published in 1961 and all by native speakers of American English...I don't know how to give a precise operational definition of what the boundaries of the 15 broad categories, and I am not quite sure what "first published" or "native speaker of American English" would mean in practice... However, in this case, it really is the thought that counts. By articulating the principles that guided the creation of the corpus, Kucera and Francis opened the way to the creation of comparable corpora for other languages and other years." - I'm afraid you undermined your own case by referring to the categorial problems. </i><div>
<br></div><div>I don't think so. If anything, I undermined the case for a position that I am not at all interested in defending. My answer to Ramesh's big question about representativeness below is a clear "no". In fact, I think that the very idea of "writing as a whole" is unhelpful, and would much rather talk about specific cases of how and why people write.</div>
<div><br></div><div>The Brown corpus categories have obvious (and probably also non-obvious) deficiencies, and certainly cannot be adopted wholesale forever. My rather limited point is that it was helpful that Kucera and Francis were explicit about their principles. As a matter of fact, many corpora were created with the explicit goal of being comparable to the Brown Corpus. "Comparable", here just means that the authors of the new corpus hope that it will be scientifically useful to make comparisons, and have tried to set up their corpus to facilitate this. Ramesh is completely entitled to question whether this effort does or can succeed. That's part of the normal process of scientific investigation. And he is also right to suggest that the influence of the Brown Corpus design could be a double-edged thing, especially when the design is adopted wholesale without much thought.</div>
<div><br></div><div><br></div><div><i>An even bigger question is: to what extent are these 15 categories truly "representative" of writing as a whole?</i><br><br><br clear="all"><div><br></div>-- <br>Chris Brew, Educational Testing Service<br>
</div>