[Corpora-List] Bootcamp: 'Quantitative Corpus Linguistics withR'--re Louw's endorsement
Yorick Wilks
Yorick at dcs.shef.ac.uk
Fri Aug 29 10:59:28 UTC 2008
But hang on, Wolfgang--that's a very mixed bag of targets. Verbmobil
was a success, and Eurotra was a disaster, and should never have been
funded---and that difference has taught us all a lot. MT is, by and
large, a success now, as any translator of web pages for free knows--
and it is not usually Systran doing the work behind the screen. Speech
recognition is not yes vs. no but 98% of what you say to your laptop
(try "Naturally speaking" for yourself). I think the difference
between us may be not that I undervalue the other sources and methods
that you cite but that I believe we also learn a lot about language
from the successes above--and you do not--I believe you think them
just practical engineering and no more.
Yorick
On 29 Aug 2008, at 11:14, Wolfgang Teubert wrote:
> I am grateful to Yorick for letting us know about his philosophy of
> language. Indeed I scanned the second, third and fourth chapter of
> the book he recommended (these are the ones, I was told, which are
> most certainly written by himself), and I learned a lot, as always
> when I listen to Yorick. He is such a great, big and impressive man,
> and I admire him very much for his outspokenness and his sense of
> abrasive humour.
>
> But we mustn't forget that we all carry our agenda around. Yorick's
> is firmly transatlantic, and I believe he has been involved in a
> number of projects funded by the NSF. I would be surprised, however,
> to hear that he has also been involved in projects funded by the NEH.
>
> Over a hundred years ago, Ferdinand de Saussure elevated linguistics
> from the mediocrity of les sciences humaines to the status of a
> 'hard science'. Sciences are about facts, and about systematic
> behaviour. So there has to be a language system, consisting (for
> Chomsky but not for Saussure) of eternal, universal and hard-wired
> laws, like the law of the second law of thermodynamics, laws, that
> is, that simple no human ingenuity can simply violate. Chomsky used
> to call them rules. So far, the only of these laws that make any
> sense to me is that a spoken utterance must have a beginning and an
> end - the law of linearity. So far, though, no language mechanism
> has ever been discovered, even though uncountable models have been
> invented. There also seems to be, particularly when it comes to
> meaning, a certain scarcity of 'brute facts'. No one has ever been
> able to give us a credible representation of a mental concept.
>
> Cognitive linguistics, and particularly its illegitimate offspring,
> NLP, however, seemed ideal to uphold the claim to scientificity.
> Indeed the starting point of the cognitive sciences was the
> computational theory of the mind. There was the blueprint for the
> mind as a hard-wired but programmable mechanism. It only needed to
> be filled, and there was the heritage of structuralism, the sememes
> and the semes Greimas and Pottier had invented and an lot of
> structuralist work done on the old continent. Not shortly
> thereafter, the cognitivists sold their blueprint back to the
> computer scientists who had now become interested in NLP, and in MT
> and AI. Sememes became reified and were called concepts, while semes
> morphed into semantic primes. Suddenly they were everywhere, in the
> cognitive literature, in the NLP literature, in transatlantic
> introductions to the philosophy of language. Here they were merged
> with the realists' obsession for truth, metaphysical or otherwise,
> the heritage of the Vienna circle and the staple fare of analytic
> philosophy. But did it work? One only has to read Alan Melby's
> devastating account of the early history of MT and why it had to
> fail. That didn't keep interested parties from funding Eurotra, this
> perfect translation platform for EU documents, or the ultimate
> solution to all AI problems, CYC, the first conceptual ontology
> endowed with common sense reasoning, or Verbmobil, the German
> attaché case translating spoken German into spoken Japanese and vice
> versa. All great success stories.
>
> Now the solution is statistics. Statistical models have the
> advantage that one can always fine-tune them a bit more, and keep
> the cashflow from drying up. I am not saying it's senseless.
> Progress is being made. Speech recognition now can distinguish 'yes'
> from 'no'. Systran is getting better all the time by attacking
> errors one by one. Google is working nicely. Summarisation is a
> great success. This is Yorick's world, and thanks to people like him
> who have a sense of history and look beyond their limited tasks,
> things are getting better. It is people like him who are needed
> there. For he knows that in the end language cannot be reduced to a
> mechanism, a computer model that would always come up with nice
> results regardless of how people actually use language and whether
> they agree with these results. Their fault!, shout the language
> engineers. They have to change the way they speak. Here is our
> controlled language. Use it and MT works just perfectly.
>
> The goal of NLP, of MT and of AI is to make computers process
> language and thus to develop applications to make our lives easier.
> That's fine, and that has to go on. After the demise of the original
> model, the people working in these fields have found that they need
> real language data, and a lot of them, to develop better models and
> turn them into marketable applications. Thus there is a huge demand
> for speech corpora, for parallel corpora and for many other kinds of
> domain-specific corpora. But this does not turn these people in to
> corpus linguists. Corpus linguistics, as I see it, is a branch of
> linguistics, not of the NLP sciences.
>
> Linguistics is the study of real, human language, not the
> development of useful gadgets simulating the use of language. For
> me, it therefore belongs to a large extent to the human sciences.
> The key difference is, for me, that the human sciences are not
> concerned with 'brute facts' but with interpreting the discourse,
> everything that has been said and is continually being said. This
> holds for art history, for anthropology, for theology, for certain
> branches of social and cultural studies, and, of course, for
> linguistics as far as it is concerned with meaning. Meaning is not a
> computational phenomenon. It may be a mental phenomenon, but then we
> don't have access to the mind. It certainly is a social phenomenon.
>
> Linguists, though, while they may be experts for many things such as
> phonology, syntax rules, language history, dialectology, are not
> specialists for meaning. The interpretation of a text, a text
> segment, a phrase or a word in its context is a task for the whole
> 'interpretive community' (Stanley Fish), and the linguist is welcome
> to participate in it. But they do not have any privileged knowledge
> concerning meaning. The meaning of signs is always the (provisional,
> never final) result of a never-ending negotiation between the sign
> users.
>
> The transatlantic philosophy of language is not as limited as the
> NLP-focused accounts of it make us believe. There are the
> contributions of the pragmatists, Peirce, Dewey, James, of the
> behaviourists including Mead and Blumer. There is also social
> constructionism, there is social epistemology, there is Feyerabend
> and Watzlawick, there is Rorty and the 'internal realism' of Hilary
> Putnam. Their texts are rarely referenced in NSF applications.
>
> Philosophy of language also takes place outside the Anglophone
> world. There is Russian formalism (Bakhtin/Volosinov), there is
> hermeneutics (Gadamer and Ricoeur), there is post-structuralism, to
> name but a few. Of course, there is no "tutorial" that tells us all
> there is to know about these deviations from orthodoxy. We have to
> deal with them as autodidacts, reading their texts without being
> told what they mean.
>
> I won't be around for the next week or so.
>
> Wolfgang
>
>
>
>
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list