[Corpora-List] Testing how representative a particular corpus is

Linda Bawcom linda.bawcom at sbcglobal.net
Wed Jan 29 16:28:37 UTC 2014


Dear Stephen and Angus,

Please forgive me replying to you in particular, but I'd already deleted the original request when I remembered  that a friend at Warwick U. is an expert in Conversation Analysis. I contacted  him and he provided me with the links stating that the site is the best one he knows of. So I'm  piggybacking off you and hoping whoever made the original request will see it. 

The link below is to a web site dedicated to conversation analysis with numerous links to research. It could be within those or their references pages that information can be found where the problem has been addressed.  

http://www.paultenhave.nl/resource.htm

The home page is: http://www.paultenhave.nl/EMCA.htm

Links to a list serve for them and other such links: http://www.paultenhave.nl/lists.htm

Kindest regards,
Linda Bawcom

>________________________________
> From: Stephen Wattam <stephenwattam at gmail.com>
>To: Angus Grieve-Smith <grvsmth at panix.com> 
>Cc: corpora at uib.no 
>Sent: Wednesday, January 29, 2014 5:41 AM
>Subject: Re: [Corpora-List] Testing how representative a particular corpus is
> 
>
>>     Right.  Here's what I don't get: Why hasn't anyone followed even a
>> single speaker around, let alone a representative sample, to see what
>> proportion of registers and genres they're exposed to on a daily basis?  Or
>> has this been done?
>
>I did exactly this (to myself) for two weeks---slides from CL'13 are up at:
>http://stephenwattam.com/misc/?p=/pc
>
>The approach we used was to gather a census, so (aside from
>methodological errors), there should be no scope for errors relating
>to representativeness.
>
>The data reinforces others' points from this thread.  The concept of
>representativeness is only useful with respect to a given research
>question.
>
>This style of sample constitutes a single (very rich) data point in a
>conventional corpus, and thus cannot tell us much about the
>representativeness of something such as the BNC.  At most it is a
>heuristic.
>
>It would be possible to extract data from the BNC matching my
>demographic details, and compare my corpus to that.  If that is
>similar, then the larger corpus is (somewhat) representative for at
>least that portion of society, with representativeness becoming less
>assured the less similar one is to myself.  There are so many external
>variables covered by larger corpora that doing detailed 'verification
>samples' like this would only be statistically valuable with a
>colossal number of participants, at which point one may as well just
>use their data for the main corpus.
>
>Further, it's not even possible to take that sample as representative
>of myself for many uses, because the two-week recording period fails
>to cover many events (even obvious periodic ones like Christmas).
>Technology is helping to defeat this limitation to some degree though
>by making sampling less intrusive.
>
>Regards,
>-- 
>Steve Wattam
>
>Contact details and availability:
>http://ɯɐʇʇɐʍuǝɥdǝʇs.com
>
>_______________________________________________
>UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>Corpora mailing list
>Corpora at uib.no
>http://mailman.uib.no/listinfo/corpora
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140129/e9c74abf/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list