[Corpora-List] Testing how representative a particular corpus is
Angus Grieve-Smith
grvsmth at panix.com
Mon Jan 27 02:22:37 UTC 2014
Jim, it's perfectly reasonable to make simplifying assumptions
based on our impressions. The scientific enterprise is about going
beyond those assumptions, one by one.
With your specific example, I used to feel similarly until I took a
bus on the South Side of Chicago and encountered a number of English
conversations that I was actually unable to understand. Science is full
of similar surprises.
This is a problem with any assumption like that: we don't know what
we don't know. We can't actually know whether your assumption is right
unless we test it against a representative sample.
http://grieve-smith.com/blog/2014/01/estimating-universals-averages-and-percentages/
On 1/26/2014 9:06 PM, Jim Fidelholtz wrote:
> Angus has made an interesting suggestion. Still, this discussion seems
> to me to be tantamount to suggesting that, if we leave our nuclear
> family (or at least talking with those of our own generation!), we are
> doomed to fail at understanding what we will encounter. I'm an
> American from the Midwest (Toledo, Ohio, early teens in Nashville, TN,
> college in Boston, then on to Washington, D. C. and the world at
> large) and I find I can mostly understand almost all English speakers,
> discoursing on almost all subjects. I can even understand much of that
> subtle, understated humor Brits are so justly famous for! Or maybe I
> just unknowingly miss a lot of what is said to me (that would be my
> wife's interpretation).
>
> Anyway, I do believe that we can can pretty well categorize genres
> with or without prolonged studies (and I don't mean to belittle such
> studies at all, but I do mean to underline the very real abilities of
> the layman to separate or categorize, at least to a large degree, that
> vast literature which we gather into corpora). I guess what I am
> saying is that we *can* understand one another, and if laymen can
> understand one another, I think that there is great hope that corpus
> linguistics will also be able, sooner or later, to muddle through all
> these problems we dream up for ourselves (and of course for the
> practical uses we hope our theoretical musings will lead to in the
> medium term).
>
--
-Angus B. Grieve-Smith
grvsmth at panix.com
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list