[Corpora-List] Chomsky and computationnel linguistics

Ramesh Krishnamurthy r.krishnamurthy at aston.ac.uk
Wed Jul 11 14:54:04 UTC 2007


>>>...there's an emerging community trying to use the
>>>tools of computational linguistics with corpora to enable linguistic
>>>theoreticians to be more empirical in their approach without
>>>abandoning their paradigm.

Which paradigm?

>>>...
>>>Work of this kind offers value to theoretical linguists because the
>>>standard paradigm of inventing and judging examples can easily miss
>>>relevant facts, and lead to false generalizations.

This paradigm? Why continue to invent examples when large corpora provide
many examples of most events/features? Is judging not subject to
idiolectal bias?

>There is a danger in corpus linguistics of conflating dialects,
>using texts produced by non-native speakers, etc

Why is 'conflating dialects' necessarily a danger? Perhaps we want to
find the common patterns across dialects?

Isn't there a danger of prioritizing idiolects in "the standard
paradigm of inventing and judging examples"?

Is it impossible for a corpus compiler/user to be able to de-select
non-native-speaker texts if these were
regarded as undesirable? Such texts are very useful for other types
of language research.

>And crucial constructions may simply be so rare that you just can't
>find them in the available corpus.

How can extremely rare events be evaluated as 'crucial'?
Crucial for what or to whom?

At a recent presentation of mine, I remember my assertion that 'frequency
indicates/suggests systemic importance' being countered with the notion of
'salience'. I don't think I answered very well at the time, but on
reflection I think
I might now say: 'Fine, tell me how to measure salience, and I will
use it to modify
my frequency lists'. Is that fair?

Best
Ramesh


At 03:39 11/07/2007, Mike Maxwell wrote:
>Philip Resnik wrote:
>>...there's an emerging community trying to use the
>>tools of computational linguistics with corpora to enable linguistic
>>theoreticians to be more empirical in their approach without
>>abandoning their paradigm.
>>...
>>Work of this kind offers value to theoretical linguists because the
>>standard paradigm of inventing and judging examples can easily miss
>>relevant facts, and lead to false generalizations.
>
>If I were doing theoretical syntax, I think this is exactly where
>I'd find myself: using corpora to keep me from missing relevant
>facts, that is, examples from people whose judgements might differ
>from my own (like my spell checker differs from me about how
>'judgement' should be spelled, but that's another question...), or
>making me consider constructions that I might otherwise have overlooked.
>
>There is a danger in corpus linguistics of conflating dialects,
>using texts produced by non-native speakers, etc.  And crucial
>constructions may simply be so rare that you just can't find them in
>the available corpus.  (Crucial examples of certain kinds of
>reduplication in "exotic" languages are an example of hard-to-find
>data in a corpus, and that's not even syntax.)  So there's still
>room, it seems to me, for introspection (or asking the person in the
>next office, or elicitation from an informant).  But as you say, the
>corpora add value, too.
>
>(Other kinds of linguistics, like lexicography, have been about
>corpus collection for centuries, of course.)
>--
>         Mike Maxwell
>         maxwell at umiacs.umd.edu
>         "Theorists...have merely to lock themselves in a room
>         with a blackboard and coffee maker to conduct their business."
>         --Bruce A. Schumm, Deep Down Things

Ramesh Krishnamurthy
Lecturer in English Studies, School of Languages and Social Sciences,
Aston University, Birmingham B4 7ET, UK
Tel: +44 (0)121-204-3812 ; Fax: +44 (0)121-204-3766 [Room NX08, 10th
Floor, North Wing of Main Building]
http://www.aston.ac.uk/lss/staff/krishnamurthyr.jsp
Project Leader, ACORN (Aston Corpus Network): http://corpus.aston.ac.uk/



More information about the Corpora mailing list