[Corpora-List] Testing how representative a particular corpus is

Mon Jan 27 02:06:51 UTC 2014

Angus has made an interesting suggestion. Still, this discussion seems to
me to be tantamount to suggesting that, if we leave our nuclear family (or
at least talking with those of our own generation!), we are doomed to fail
at understanding what we will encounter. I'm an American from the Midwest
(Toledo, Ohio, early teens in Nashville, TN, college in Boston, then on to
Washington, D. C. and the world at large) and I find I can mostly
understand almost all English speakers, discoursing on almost all subjects.
I can even understand much of that subtle, understated humor Brits are so
justly famous for! Or maybe I just unknowingly miss a lot of what is said
to me (that would be my wife's interpretation).

Anyway, I do believe that we can can pretty well categorize genres with or
without prolonged studies (and I don't mean to belittle such studies at
all, but I do mean to underline the very real abilities of the layman to
separate or categorize, at least to a large degree, that vast literature
which we gather into corpora). I guess what I am saying is that we *can*
understand one another, and if laymen can understand one another, I think
that there is great hope that corpus linguistics will also be able, sooner
or later, to muddle through all these problems we dream up for ourselves
(and of course for the practical uses we hope our theoretical musings will
lead to in the medium term).

Jim

James L. Fidelholtz
Posgrado en Ciencias del Lenguaje
Instituto de Ciencias Sociales y Humanidades
Benemérita Universidad Autónoma de Puebla, MÉXICO

On Sun, Jan 26, 2014 at 7:33 PM, Angus Grieve-Smith <grvsmth at panix.com>wrote:

>  On 1/26/2014 4:51 PM, Matías Guzmán Naranjo wrote:
>
>
> Another thing we can do is to put off the problem of finding a
>> representative sample of Language X and focus on a particular genre or
>> register, where there will be less variability.
>
>
>  The problem is that we want to be able to generalized. It is of little
> insight to say that construction X is more frequent than construction Y in
> <<semi-guided interviews conducted by profession linguists, where the test
> subjects know they are being recorded>> for the 100 people you picked. We
> would like to be able to say that those results are representative of, say,
> spoken language in a particular city, or at least a formal spoken register.
> Not being able to generalize would mean that things like collocational, or
> collostructional studies are meaningless for spoken corpora because they
> would only apply to that particular set of texts.
>
>
>     Right.  Here's what I don't get: Why hasn't anyone followed even a
> single speaker around, let alone a representative sample, to see what
> proportion of registers and genres they're exposed to on a daily basis?  Or
> has this been done?
>
>
> --
> 				-Angus B. Grieve-Smith
> 				grvsmth at panix.com
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140126/abacad38/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora