[Corpora-List] Bootcamp: 'Quantitative Corpus Linguistics withR'--re Louw's endorsement

Yorick Wilks Yorick at dcs.shef.ac.uk
Fri Aug 29 10:59:28 UTC 2008


But hang on, Wolfgang--that's a very mixed bag of targets. Verbmobil  
was a  success, and Eurotra was a disaster, and should never have been  
funded---and that difference has taught us all a lot. MT is, by and  
large, a success now, as any translator of web pages for free knows-- 
and it is not usually Systran doing the work behind the screen. Speech  
recognition is not yes vs. no but 98% of what you say to your laptop  
(try "Naturally speaking"  for yourself). I think the difference  
between us may be  not that I undervalue the other sources and methods  
that you cite but that I believe we also learn a lot about language  
from the successes above--and you do not--I believe you think them  
just practical engineering and no more.
Yorick


On 29 Aug 2008, at 11:14, Wolfgang Teubert wrote:

> I am grateful to Yorick for letting us know about his philosophy of  
> language. Indeed I scanned the second, third and fourth chapter of  
> the book he recommended (these are the ones, I was told, which are  
> most certainly written by himself), and I learned a lot, as always  
> when I listen to Yorick. He is such a great, big and impressive man,  
> and I admire him very much for his outspokenness and his sense of  
> abrasive humour.
>
> But we mustn't forget that we all carry our agenda around. Yorick's  
> is firmly transatlantic, and I believe he has been involved in a  
> number of projects funded by the NSF. I would be surprised, however,  
> to hear that he has also been involved in projects funded by the NEH.
>
> Over a hundred years ago, Ferdinand de Saussure elevated linguistics  
> from the mediocrity of les sciences humaines to the status of a  
> 'hard science'. Sciences are about facts, and about systematic  
> behaviour. So there has to be a language system, consisting (for  
> Chomsky but not for Saussure) of eternal, universal and hard-wired  
> laws, like the law of the second law of thermodynamics, laws, that  
> is, that simple no human ingenuity can simply violate. Chomsky used  
> to call them rules. So far, the only of these laws that make any  
> sense to me is that a spoken utterance must have a beginning and an  
> end - the law of linearity. So far, though, no language mechanism  
> has ever been discovered, even though uncountable models have been  
> invented. There also seems to be, particularly when it comes to  
> meaning, a certain scarcity of 'brute facts'. No one has ever been  
> able to give us a credible representation of a mental concept.
>
> Cognitive linguistics, and particularly its illegitimate offspring,  
> NLP, however, seemed ideal to uphold  the claim to scientificity.  
> Indeed the starting point of the cognitive sciences was the  
> computational theory of the mind. There was the blueprint for the  
> mind as a hard-wired but programmable mechanism. It only needed to  
> be filled, and there was the heritage of structuralism, the sememes  
> and the semes Greimas and Pottier had invented and an lot of  
> structuralist work done on the old continent. Not shortly  
> thereafter, the cognitivists sold their blueprint back to the  
> computer scientists who had now become interested in NLP, and in MT  
> and AI. Sememes became reified and were called concepts, while semes  
> morphed into semantic primes. Suddenly they were everywhere, in the  
> cognitive literature, in the NLP literature, in transatlantic  
> introductions to the philosophy of language. Here they were merged  
> with the realists' obsession for truth, metaphysical or otherwise,  
> the heritage of the Vienna circle and the staple fare of analytic  
> philosophy. But did it work? One only has to read Alan Melby's  
> devastating account of the early history of MT and why it had to  
> fail. That didn't keep interested parties from funding Eurotra, this  
> perfect translation platform for EU documents, or the ultimate  
> solution to all AI problems, CYC, the first conceptual ontology  
> endowed with common sense reasoning, or Verbmobil, the German  
> attaché case translating spoken German into spoken Japanese and vice  
> versa. All great success stories.
>
> Now the solution is statistics. Statistical models have the  
> advantage that one can always fine-tune them a bit more, and keep  
> the cashflow from drying up. I am not saying it's senseless.  
> Progress is being made. Speech recognition now can distinguish 'yes'  
> from 'no'. Systran is getting better all the time by attacking  
> errors one by one. Google is working nicely. Summarisation is a  
> great success. This is Yorick's world, and thanks to people like him  
> who have a sense of history and look beyond their limited tasks,  
> things are getting better. It is people like him who are needed  
> there. For he knows that in the end language cannot be reduced to a  
> mechanism, a computer model that would always come up with nice  
> results regardless of how people actually use language and whether  
> they agree with these results. Their fault!, shout the language  
> engineers. They have to change the way they speak. Here is our  
> controlled language. Use it and MT works just perfectly.
>
> The goal of NLP, of MT and of AI is to make computers process  
> language and thus to develop applications to make our lives easier.  
> That's fine, and that has to go on. After the demise of the original  
> model, the people working in these fields have found that they need  
> real language data, and a lot of them, to develop better models and  
> turn them into marketable applications. Thus there is a huge demand  
> for speech corpora, for parallel corpora and for many other kinds of  
> domain-specific corpora. But this does not turn these people in to  
> corpus linguists. Corpus linguistics, as I see it, is a branch of  
> linguistics, not of the NLP sciences.
>
> Linguistics is the study of real, human language, not the  
> development of useful gadgets simulating the use of language.  For  
> me, it therefore belongs to a large extent to the human sciences.  
> The key difference is, for me, that the human sciences are not  
> concerned with 'brute facts' but with interpreting the discourse,  
> everything that has been said and is continually being said. This  
> holds for art history, for anthropology, for theology, for certain  
> branches of social and cultural studies, and, of course, for  
> linguistics as far as it is concerned with meaning. Meaning is not a  
> computational phenomenon. It may be a mental phenomenon, but then we  
> don't have access to the mind. It certainly is a social phenomenon.
>
> Linguists, though, while they may be experts for many things such as  
> phonology, syntax rules, language history, dialectology, are not  
> specialists for meaning. The interpretation of a text, a text  
> segment, a phrase or a word in its context is a task for the whole  
> 'interpretive community' (Stanley Fish), and the linguist is welcome  
> to participate in it. But they do not have any privileged knowledge  
> concerning meaning. The meaning of signs is always the (provisional,  
> never final) result of a never-ending negotiation between the sign  
> users.
>
> The transatlantic philosophy of language is not as limited as the  
> NLP-focused accounts of it make us believe. There are the  
> contributions of the pragmatists, Peirce, Dewey, James, of the  
> behaviourists including Mead and Blumer. There is also social  
> constructionism, there is social epistemology, there is Feyerabend  
> and Watzlawick, there is Rorty and the 'internal realism' of Hilary  
> Putnam. Their texts are rarely referenced in NSF applications.
>
> Philosophy of language also takes place outside the Anglophone  
> world. There is Russian formalism (Bakhtin/Volosinov), there is  
> hermeneutics (Gadamer and Ricoeur), there is post-structuralism, to  
> name but a few. Of course, there is no "tutorial" that tells us all  
> there is to know about these deviations from orthodoxy. We have to  
> deal with them as autodidacts, reading their texts without being  
> told what they mean.
>
> I won't be around for the next week or so.
>
> Wolfgang
>
>
>
>


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list