[Corpora-List] Testing how representative a particular corpus is

Stephen Wattam stephenwattam at gmail.com
Wed Jan 29 11:41:32 UTC 2014


>     Right.  Here's what I don't get: Why hasn't anyone followed even a
> single speaker around, let alone a representative sample, to see what
> proportion of registers and genres they're exposed to on a daily basis?  Or
> has this been done?

I did exactly this (to myself) for two weeks---slides from CL'13 are up at:
http://stephenwattam.com/misc/?p=/pc

The approach we used was to gather a census, so (aside from
methodological errors), there should be no scope for errors relating
to representativeness.

The data reinforces others' points from this thread.  The concept of
representativeness is only useful with respect to a given research
question.

This style of sample constitutes a single (very rich) data point in a
conventional corpus, and thus cannot tell us much about the
representativeness of something such as the BNC.  At most it is a
heuristic.

It would be possible to extract data from the BNC matching my
demographic details, and compare my corpus to that.  If that is
similar, then the larger corpus is (somewhat) representative for at
least that portion of society, with representativeness becoming less
assured the less similar one is to myself.  There are so many external
variables covered by larger corpora that doing detailed 'verification
samples' like this would only be statistically valuable with a
colossal number of participants, at which point one may as well just
use their data for the main corpus.

Further, it's not even possible to take that sample as representative
of myself for many uses, because the two-week recording period fails
to cover many events (even obvious periodic ones like Christmas).
Technology is helping to defeat this limitation to some degree though
by making sampling less intrusive.

Regards,
-- 
Steve Wattam

Contact details and availability:
http://ɯɐʇʇɐʍuǝɥdǝʇs.com

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list