[Corpora-List] Boot Camp (Continued...)
Yorick Wilks
yorick at dcs.shef.ac.uk
Tue Aug 19 14:30:35 UTC 2008
Lou
I hope it is obvious that nothing I wrote earlier was with any
intended disrespect to the study of corpora in language teaching and
translation; both are activities of which I think highly, of course.
Christopher Tribble's contribution was the clearest reference to the
practice of using corpora in these activities, and I would maintain it
is, however much improved and refined, a traditional activity known
for centuries; it requires corpora to be humanly "viewable"--to use
the same rather unsatisfactory word as I did earlier. That seems to me
a long way from computing over very large corpora that are not humanly
viewable or assimilable or even readable: it is that activity that
both computational linguistics and corpus linguistics claim to be
involved in, and it is there that the source of the negativity seems
to be. This long discussion has been full of little remarks by corpus
linguists pulling their skirts tight about them and saying how they
dont want to read or know about, let alone do, what computational
linguists do with corpora. It is that reaction that I still find
puzzling; why does anyone care what consenting adults do with corpora
in the privacy of their computers? Such activity either produces
demonstrable results, or useful artifacts, or it does not--what else
is there to say?
I confessed earlier that CL/NLP is having a dull patch as a whole, but
that is not true of machine translation, which is having a mini-
renaissance, with method sometimes called statistical (following
Mercer and Jelinek) and sometimes example-based (following Nagao). In
fact there is no real difference between them and both rest entirely
on corpus data provided by human translators, whose skill they attempt
to learn, and with increasing success, as any user of internet free
translators knows. There is no clear dividing line here at all between
the parts of this large field, only, it seems, bad feelings.
Yorick
On 19 Aug 2008, at 10:52, Lou Burnard wrote:
> Yorick says "what I have never seen able to see is what corpus
> linguistics (in the sense in which the phrase is owned by the main
> contributors to this debate) is FOR, except the production of better
> dictionaries"
>
> As far as I am aware one of the largest communities interested in
> consuming the fruits of corpus linguistics (whether you're talking
> about corpora or the methods attached to them) is that of people
> engaged in the humdrum but utterly mysterious business of language
> teaching and translation.
>
> Sadly, none of that community seems to have seen fit (yet) to
> contribute to the present discussion. But I think if they did they
> might suggest that corpus linguistics is very definitely "for" those
> wanting to ground their pedagogic practice in language as
> experienced, rather than language as theorized (which is of course
> experience too, but not quite the same order).
>
> Lou
>
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list