[Corpora-List] Chomsky and computationnel linguistics
Ramesh Krishnamurthy
r.krishnamurthy at aston.ac.uk
Wed Jul 11 14:54:04 UTC 2007
>>>...there's an emerging community trying to use the
>>>tools of computational linguistics with corpora to enable linguistic
>>>theoreticians to be more empirical in their approach without
>>>abandoning their paradigm.
Which paradigm?
>>>...
>>>Work of this kind offers value to theoretical linguists because the
>>>standard paradigm of inventing and judging examples can easily miss
>>>relevant facts, and lead to false generalizations.
This paradigm? Why continue to invent examples when large corpora provide
many examples of most events/features? Is judging not subject to
idiolectal bias?
>There is a danger in corpus linguistics of conflating dialects,
>using texts produced by non-native speakers, etc
Why is 'conflating dialects' necessarily a danger? Perhaps we want to
find the common patterns across dialects?
Isn't there a danger of prioritizing idiolects in "the standard
paradigm of inventing and judging examples"?
Is it impossible for a corpus compiler/user to be able to de-select
non-native-speaker texts if these were
regarded as undesirable? Such texts are very useful for other types
of language research.
>And crucial constructions may simply be so rare that you just can't
>find them in the available corpus.
How can extremely rare events be evaluated as 'crucial'?
Crucial for what or to whom?
At a recent presentation of mine, I remember my assertion that 'frequency
indicates/suggests systemic importance' being countered with the notion of
'salience'. I don't think I answered very well at the time, but on
reflection I think
I might now say: 'Fine, tell me how to measure salience, and I will
use it to modify
my frequency lists'. Is that fair?
Best
Ramesh
At 03:39 11/07/2007, Mike Maxwell wrote:
>Philip Resnik wrote:
>>...there's an emerging community trying to use the
>>tools of computational linguistics with corpora to enable linguistic
>>theoreticians to be more empirical in their approach without
>>abandoning their paradigm.
>>...
>>Work of this kind offers value to theoretical linguists because the
>>standard paradigm of inventing and judging examples can easily miss
>>relevant facts, and lead to false generalizations.
>
>If I were doing theoretical syntax, I think this is exactly where
>I'd find myself: using corpora to keep me from missing relevant
>facts, that is, examples from people whose judgements might differ
>from my own (like my spell checker differs from me about how
>'judgement' should be spelled, but that's another question...), or
>making me consider constructions that I might otherwise have overlooked.
>
>There is a danger in corpus linguistics of conflating dialects,
>using texts produced by non-native speakers, etc. And crucial
>constructions may simply be so rare that you just can't find them in
>the available corpus. (Crucial examples of certain kinds of
>reduplication in "exotic" languages are an example of hard-to-find
>data in a corpus, and that's not even syntax.) So there's still
>room, it seems to me, for introspection (or asking the person in the
>next office, or elicitation from an informant). But as you say, the
>corpora add value, too.
>
>(Other kinds of linguistics, like lexicography, have been about
>corpus collection for centuries, of course.)
>--
> Mike Maxwell
> maxwell at umiacs.umd.edu
> "Theorists...have merely to lock themselves in a room
> with a blackboard and coffee maker to conduct their business."
> --Bruce A. Schumm, Deep Down Things
Ramesh Krishnamurthy
Lecturer in English Studies, School of Languages and Social Sciences,
Aston University, Birmingham B4 7ET, UK
Tel: +44 (0)121-204-3812 ; Fax: +44 (0)121-204-3766 [Room NX08, 10th
Floor, North Wing of Main Building]
http://www.aston.ac.uk/lss/staff/krishnamurthyr.jsp
Project Leader, ACORN (Aston Corpus Network): http://corpus.aston.ac.uk/
More information about the Corpora
mailing list