[Corpora-List] Linguistics, corpus linguistics, and diglossia

Mike Maxwell maxwell at umiacs.umd.edu
Fri Dec 17 04:51:29 UTC 2010


On 12/16/2010 1:06 AM, Adam Kilgarriff wrote:
> It's all about getting the right corpus.  It's almost always harder to
> get informal than formal text types.
> ...
> A delight of the web is that it has lots of informal language in it,
> specially in blogs and similar, so, with a little application, we can
> gather text of informal types.

This is true for most European languages, and some Asian languages 
(Chinese, obviously, and many national languages, particularly of 
countries that have a higher income level).  It is often not true of 
minority languages, nor of most African languages, and it's obviously 
untrue of unwritten languages (which are probably the majority of 
languages today).

> saying that corpus linguistics was exactly the wrong way to build a
> dictionary
>
> That's just a counsel of failure.  What does she propose doing instead?
> Guess (sorry, introspect - mustn't be rude)?

Failure would be not to do anything.  So yes, introspection, combined 
with listening to people actually talking, and taking notes on that. 
She did mention that in a diglossic situation, introspection can be 
biased in the case of educated speakers, because they have been taught 
how they "ought" to talk, so it's hard for them to realize that that's 
not the way they talk.
-- 
	Mike Maxwell
	maxwell at umiacs.umd.edu
         "A library is the best possible imitation, by human beings,
         of a divine mind, where the whole universe is viewed and
         understood at the same time... we have invented libraries
         because we know that we do not have divine powers, but we
         try to do our best to imitate them." --Umberto Eco

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list