[Corpora-List] Testing how representative a particular corpus is

Marc Brysbaert Marc.Brysbaert at UGent.be
Wed Jan 22 21:19:15 UTC 2014


We use lexical decision times (is this a word or not?) to validate  
word frequency measures from different types of corpora. Usually  
spoken corpora are not doing extremely well, although this could be  
due to their small size. Subtitles seem to come closest to spoken  
language. You find two pointers here:

http://crr.ugent.be/papers/Brysbaert%20&%20New%20BRM%202009%20Subtlexus.pdf
http://crr.ugent.be/archives/1423

or on our website:

http://crr.ugent.be


Best, mb

Quoting Matías Guzmán Naranjo <mortem.dei at gmail.com>:

> Dear all,
>
> A (not involved in corpus linguistics) college expressed his concerns to me
> about corpus linguistics, mainly the fact that he thought oral corpora are
> not really representative of spoken language, and that thus, results of
> investigations that use oral corpora are not really reliable as reflecting
> the wider picture of how people speak and use language. My question is
> whether there have been studies done about how representative are, say
> phone recordings, or semi-guided interviews, of actual spoken language.
>
> I use oral corpora for my work but just assume that semi-guided interviews
> are somewhat representative of spoken language outside semi-guided
> interviews, and that the results do generalize to some degree to the rest
> of situations, but I ad never really thought about testing this assumption.
>
> Best,
>
> Matías Guzmán Naranjo




_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list