[Corpora-List] Boot Camp (Continued...)

Yorick Wilks yorick at dcs.shef.ac.uk
Tue Aug 19 14:30:35 UTC 2008


Lou
I hope it is obvious that nothing I wrote earlier was with any  
intended disrespect to the study of corpora in language teaching and  
translation; both are activities of which I think highly, of course.  
Christopher Tribble's contribution was the clearest reference to the  
practice of using corpora in these activities, and I would maintain it  
is, however much improved and refined, a traditional activity known  
for centuries; it requires corpora to be humanly "viewable"--to use  
the same rather unsatisfactory word as I did earlier. That seems to me  
a long way from computing over very large corpora that are not humanly  
viewable or assimilable or even readable: it is that activity that  
both computational linguistics and corpus linguistics claim to be  
involved in, and it is there that the source of the negativity seems  
to be. This long discussion has been full of little remarks by corpus  
linguists pulling their skirts tight about them and saying how they  
dont want to read or know about, let alone do, what computational  
linguists do with corpora. It is that reaction that I still find  
puzzling; why does anyone care what consenting adults do with corpora  
in the privacy of their computers? Such activity either produces  
demonstrable results, or useful artifacts,  or it does not--what else  
is there to say?

I confessed earlier that CL/NLP is having a dull patch as a whole, but  
that is not true of machine translation, which is having a mini- 
renaissance, with method sometimes called statistical (following  
Mercer and Jelinek) and sometimes example-based (following Nagao). In  
fact there is no real difference between them and both rest entirely  
on corpus data provided by human translators, whose skill they attempt  
to learn, and with increasing success, as any user of internet free  
translators knows. There is no clear dividing line here at all between  
the parts of this large field, only, it seems, bad feelings.
Yorick

On 19 Aug 2008, at 10:52, Lou Burnard wrote:

> Yorick says "what I have never seen able to see is what corpus  
> linguistics (in the sense in which the phrase is owned by the main  
> contributors to this debate) is FOR, except the production of better  
> dictionaries"
>
> As far as I am aware one of the largest communities interested in  
> consuming the fruits of corpus linguistics (whether you're talking  
> about corpora or the methods attached to them) is that of people  
> engaged in the humdrum but utterly mysterious business of language  
> teaching and translation.
>
> Sadly, none of that community seems to have seen fit (yet) to  
> contribute to the present discussion. But I think if they did they  
> might suggest that corpus linguistics is very definitely "for" those  
> wanting to ground their pedagogic practice in language as  
> experienced, rather than language as theorized (which is of course  
> experience too, but not quite the same order).
>
> Lou
>


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list