[Corpora-List] Boot Camp (Continued...)

Yorick Wilks yorick at dcs.shef.ac.uk
Tue Aug 19 09:17:44 UTC 2008


I  have tried to stay out of this public debate and let John Sowa do  
the heavy lifting for those concerned not only with corpora and  
computation
but also with meaning and representation, a position that does not  
need to be called mentalist or cognitive, terms which have only  
muddled this debate further. Wolfgang's and Geoffrey's  most recent  
pieces have broken  my resolve and made me feel that there are still  
misunderstandings not only of, but by, "corpus linguists" to be  
cleared up. Dan Melamed is a talented man and can believe what he  
likes about meaning or its absence but many in computational  
linguistics/NLP are very much interested in meaning and always have  
been;I would claim at least 45 years of interest in their  
intersection. When Geoffrey writes today of the failure of "mentalist  
parsers" I have no idea what he means: corpus-based parsers now work  
pretty well even for English, always one of the hardest languages to  
parse effectively (Charniak's methods have been the best so far  
overall). The problem with parsers is not now whether they work, but  
what they are FOR, if any kind of meaning extraction/representation is  
one's goal.

Some of my friends are corpus linguists, one of my ex-students  
(Patrick Hanks) was mentioned yesterday as an excellent corpus  
linguist and lexicographer, but what I have never seen able to see is  
what corpus linguistics (in the sense in which the phrase is owned by  
the main contributors to this debate) is FOR, except the production of  
better dictionaries, and that is a fine and worthy end, except,  
paradoxically of course for the corpus linguist, dictionaries are  
never more than compendia of intuitions! The low-level anti- 
representational empiricism of corpling must, it has always seemed to  
me, inhibit it from really carrying out any other task, in the sense  
in which CL/NLP does carry out many tasks with corpora on a vast scale  
(hence the web-as-corpus movement, a resource now available to any  
competent graduate student). The debate so far has not convinced me  
that many corplingers know much about CL/NLP: I think they are a bit  
frightened of it as real computation and prefer to tinker with small  
"toolboxes" to no clear end, while muttering the incantation "trust  
the text".

Whatever that phrase meant to John Sinclair, or to corplingers now, it  
can only have sense with respect to small "viewable" texts, since it  
plainly comes from the tradition of literary criticism, and owes a  
lot, I still believe,  to the assumptions of "Scrutiny", and Empsonian  
Thirties beliefs that linked only to the superficial aspects of what  
Wittgenstein was saying at the time. What can "trust the text" mean of  
a corpus of 1.8 billion words (our standard reference corpus for  
English experiments) let alone the whole accessible English web?! I  
was a great admirer of John, and argued with him over many years and  
worked with him on the EAGLES project, but I have to say i do not  
believe that mantra has any precise sense at all beyond mere  
commonsense--except perhaps "beware of theory"! It is certainly not  
substantial enough to support a movement; its constant repetitiuon in  
this debate certainly has cult overtones as some contributors have  
noticed.

I think corplingers should get out more and find out what CL/NLP is  
actually doing--Hanks certainly has done that----and even though CL/ 
NLP is actually in rather a sterile state at the moment--at the end of  
the pure-statistics paradigm that started in 1989 and now waiting for  
the tide to turn. Another and quite different place to look is what is  
happening in the Semantic Web movement, which is now starting to pay  
off in real practical terms, and which seeks to link high-level  
representations of science (everything else) to to their grounding in  
corpora via the annotation movement (that came from the humanities, of  
course, and is the foundation of all current CL/NLP and HTML, XML et  
al.)  as well as the automatic induction of ontologies from corpora.  
My take on this is (unsurprisingly) more language-centred than the  
Semantic Web's founders would like, since they are not much interested  
in language as such. IThe first item in the first URL below  appeared  
recently in IEEE Intelligent systems as "The Semantic Web: the  
apotheorsis of annotation, but what are its semantics?" . The second  
URL is a feeble attempt by a former philosopher-turned-computational- 
linguist to link today's vast corpora back to Wittgenstein, who keeps  
cropping up in the debate for reasons I am still not quite clear about.
Yorick Wilks

http://www.dcs.shef.ac.uk/~yorick/papers.html
http://www.aisb.org.uk/convention/aisb08/proc/proceedings/12%20Computing%20and%20Philosophy/01.pdf




On 19 Aug 2008, at 09:02, Wolfgang Teubert wrote:

>
>
>
> Dear All,
>
> It seems to me our discussion is trawling off in various directions  
> which all are enlightening in their own ways. Some contributions  
> show what has been dubbed a language engineering angle. Their  
> primary interest, as I see it, lies in the development of language  
> technology to come up with certain useful applications like corpus- 
> based machine translation, knowledge extraction, automatic  
> abstracting  and expert systems, for instance. As long as they can  
> show that the performance of their systems goes up, they will get  
> more funding and are happy. They are not interested in a discussion  
> of meaning. I once was told by Dan Melamed that as far as he is  
> concerned, meaning is an illusion.
>
> Others focus on the joys and horrors of corpus compilation, their  
> availability and the ensuing copyright problems. These are certainly  
> important issues, and with the ongoing privatisation of the  
> electronic academic discourse and the electronic data we use, it is  
> only a matter of time that the only way to gain access will be to  
> find sponsorship from these discourse providers and carry out the  
> kind of research they are interested in. Google, Microsoft and  
> NewsCorp spring to mind. They will soon play a role in academic  
> research comparable to that of Monsanto or GlaxoSmithKline in their  
> respective fields.
>
> Perhaps, though, it might worth our while to resume a specifically  
> linguistic discussion. I am, for instance, interested in meaning.  
> And it seems to me  that for each school of linguistic thought,  
> meaning means something different. Some contributors apparently  
> think that it is a matter of tolerance not discriminate against any  
> of them. I agree. But linguistics, as I see it, is not only a belief  
> system allowing everyone to be happy in their own way. Because it  
> is, for me at least, to quite a considerable extent part of the  
> human, the interpretive sciences, it thrives on clashes in  
> argumentation and interpretation. To my taste, not nearly enough of  
> that is taking place. There are already many sectarian tendencies in  
> linguistics, obvious from the lists of references at the end of  
> academic papers. One tends to quote only people within one's own  
> camp. Inside these camps, there is the same kind of stated  
> homogeneity as one finds among the frontbenchers of our  
> parliamentary parties. But the more monovocal a discourse is the  
> more static and the less open to innovation will it be. Only a  
> plurivocal, democratic discourse can come up with new ideas.
>
> Being a corpus linguist, I am not affiliated with any of the many  
> cognitive camps. Some of them, it seems to me, have moved away from  
> their former vicinity to the philosophy of mind. They are quite  
> content to provide models that allow the representation of meaning  
> in some formalistic and abstract way but in no way assume their  
> model to be isomorphic or even just functionally equivalent to the  
> working of the mind. (Indeed I recently read a textbook for  
> cognitive linguistics in which the word 'mind' did not occur at  
> all.) I see this as a return to the happy days of the 1960s and 70s,  
> when people like Greimas and Pottier developed their structuralist  
> theory of semantic features.
>
> As similar as this theory is to some of what has been said about  
> mental concepts, sememes were never more than abstract notions and  
> were never thought to have ontological status. Originally, at least,  
> in cognitive linguistics, what was called mental/cognitive concepts/ 
> representations was said to be isomorphic models of what we could  
> actually find in the mind if we only had access to it. It is only  
> this aspect in which I find cognitive linguistics flawed, namely  
> those varieties of it that are concerned with the way thought is  
> turned into an utterance and vice versa. These varieties are, from  
> my outside perspective, connected with names like Langacker, Lakoff,  
> Jackendoff, Levinson, Sperber/Wilson, Wierzbicka, Fodor, Pinker,  
> Chomsky. For all of them, even though some of them rejects the  
> cognitive label, the meaning of an utterance is its mental  
> representation, a representation in some kind of mentalese.
>
> There are, of course differences concerning the nature of these  
> representations. For Lakoff, they are non-symbolic, embodied,  
> entities of experience. For Pinker they are symbolic. Either way,  
> this brings in a complication. What is it that makes a non-symbolic  
> entity symbolic? What needs to be added, and from where does this  
> addition come? If mental concepts are symbolic, they need to be  
> interpreted, but by whom? By Searle's homunculus or by Dennett's  
> central meaner? The other problem is that the language of thought is  
> thought to be language-independent. But is it really possible for  
> one cognitive linguist to convince their colleague what the content  
> of a mental concept is if all they can come up with is a translation  
> into some natural language? Levinson's inconclusive work on Tzeltal  
> springs to mind. A third problem is that we, the language users, are  
> obviously unaware of our mental representations. Does that mean we  
> are also unaware of our thoughts? What about intentionality? Does  
> the mental processing of utterances mean that mental concepts are  
> processed as uninterpreted symbols, just as a computer would  
> summarise a text without knowing what it means? Is meaning, as  
> Melamed would probably like to have it, no more than a supervenient  
> feature, a figment of our imagination?
>
> I know that for many what I say here is no more than a crude and  
> mistaken caricature of the status of mental concepts in various  
> cognitive camps. Again I announce my willingness to be converted to  
> the camp that can show me the 'true' mental representation of the  
> word 'globalisation'. Could it ever be more than what has been said  
> in the discourse about globalisation? Once it has been translated  
> into a language of thought, does it not have to be translated for  
> someone like me again into a natural language? Is this more than a  
> triplication of the same content?
>
> More recently I find that many cognitive linguists like to pass  
> their mental concepts on to the neural sciences. They then appear to  
> become synaptically connected clusters of neurons firing. But once I  
> have identified the neurons in question, do I then know what  
> 'globalisation' means?  Or are we told that it really does not  
> matter a bit what it means as long as we behave in the prescribed  
> manner?
>
> For me, the meaning of 'globalisation' is all that has been said  
> about globalisation. Meaning is only in the discourse, and nowhere  
> else. Our task as language users is (I do not believe that linguists  
> have a privileged access to meaning) to collaborate in interpreting  
> this discourse evidence. There is no valid interpretation as such.  
> As long as they are based on evidence accepted as such by the  
> interpretive community (Stanley Fish), they will have an impact on  
> the discourse and add something to the meaning of 'globalisation'.  
> Meanings and their interpretations are always provisional, as long  
> as the discourse goes on. The clash of different interpretations is  
> what makes innovation, or progress,  possible.
>
> What language engineers are doing (and often doing very  
> successfully) will never tell us anything about meaning. An  
> automatic summarisation of a paper is never an interpretation of it.  
> For me, however, the sole raison d'etre for linguistics is to try  
> and find ways to make sense of what is said.
>
> To appeal to a sense of tolerance and let everyone be happy in their  
> own way will not promote the new ideas we need. We have to show  
> where we differ.
>
> Cheers
>
> Wolfgang
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080819/d0d2c6e1/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list