[Corpora-List] Testing how representative a particular corpus is

Angus Grieve-Smith grvsmth at panix.com
Mon Jan 27 02:22:37 UTC 2014


     Jim, it's perfectly reasonable to make simplifying assumptions 
based on our impressions.  The scientific enterprise is about going 
beyond those assumptions, one by one.

     With your specific example, I used to feel similarly until I took a 
bus on the South Side of Chicago and encountered a number of English 
conversations that I was actually unable to understand. Science is full 
of similar surprises.

     This is a problem with any assumption like that: we don't know what 
we don't know.  We can't actually know whether your assumption is right 
unless we test it against a representative sample.

http://grieve-smith.com/blog/2014/01/estimating-universals-averages-and-percentages/

On 1/26/2014 9:06 PM, Jim Fidelholtz wrote:
> Angus has made an interesting suggestion. Still, this discussion seems 
> to me to be tantamount to suggesting that, if we leave our nuclear 
> family (or at least talking with those of our own generation!), we are 
> doomed to fail at understanding what we will encounter. I'm an 
> American from the Midwest (Toledo, Ohio, early teens in Nashville, TN, 
> college in Boston, then on to Washington, D. C. and the world at 
> large) and I find I can mostly understand almost all English speakers, 
> discoursing on almost all subjects. I can even understand much of that 
> subtle, understated humor Brits are so justly famous for! Or maybe I 
> just unknowingly miss a lot of what is said to me (that would be my 
> wife's interpretation).
>
> Anyway, I do believe that we can can pretty well categorize genres 
> with or without prolonged studies (and I don't mean to belittle such 
> studies at all, but I do mean to underline the very real abilities of 
> the layman to separate or categorize, at least to a large degree, that 
> vast literature which we gather into corpora). I guess what I am 
> saying is that we *can* understand one another, and if laymen can 
> understand one another, I think that there is great hope that corpus 
> linguistics will also be able, sooner or later, to muddle through all 
> these problems we dream up for ourselves (and of course for the 
> practical uses we hope our theoretical musings will lead to in the 
> medium term).
>

-- 
				-Angus B. Grieve-Smith
				grvsmth at panix.com


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list