[Corpora-List] Linguistics, corpus linguistics, and diglossia
Mike Maxwell
maxwell at umiacs.umd.edu
Fri Dec 17 04:51:29 UTC 2010
On 12/16/2010 1:06 AM, Adam Kilgarriff wrote:
> It's all about getting the right corpus. It's almost always harder to
> get informal than formal text types.
> ...
> A delight of the web is that it has lots of informal language in it,
> specially in blogs and similar, so, with a little application, we can
> gather text of informal types.
This is true for most European languages, and some Asian languages
(Chinese, obviously, and many national languages, particularly of
countries that have a higher income level). It is often not true of
minority languages, nor of most African languages, and it's obviously
untrue of unwritten languages (which are probably the majority of
languages today).
> saying that corpus linguistics was exactly the wrong way to build a
> dictionary
>
> That's just a counsel of failure. What does she propose doing instead?
> Guess (sorry, introspect - mustn't be rude)?
Failure would be not to do anything. So yes, introspection, combined
with listening to people actually talking, and taking notes on that.
She did mention that in a diglossic situation, introspection can be
biased in the case of educated speakers, because they have been taught
how they "ought" to talk, so it's hard for them to realize that that's
not the way they talk.
--
Mike Maxwell
maxwell at umiacs.umd.edu
"A library is the best possible imitation, by human beings,
of a divine mind, where the whole universe is viewed and
understood at the same time... we have invented libraries
because we know that we do not have divine powers, but we
try to do our best to imitate them." --Umberto Eco
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list