The Chinese Diplomat's "the"

Rob Malouf rmalouf at mail.sdsu.edu
Mon Aug 30 15:22:27 UTC 2004


Hi,

On Aug 30, 2004, at 7:34 AM, Salinas17 at aol.com wrote:
> In a message dated 8/29/04 7:14:03 PM, rmalouf at mail.sdsu.edu writes:
> << At any rate, the performance of the best models is getting close to
> that
> of humans at guessing which article will be used in a given context. >>
>
> There's an irony to why one sees such adherence to structuralist
> criteria on
> the "functional" linguistics list.  In most situations, of course, a
> computer
> model cannot possibly predict the use of "the" versus "a" unless it
> also reads
> minds.

It's hard for me to imagine anything less "structuralist" than an
instance-based model like this one. The system produces an article for
a sequence like "please get ___ car"  by searching a reference corpus
for similar patterns.  If it finds sequences like "please get the car"
more often than "please get a car" or "please get car", it produces a
"the".

The amazing thing is that this actually works!  If we take a corpus,
strip out all the articles, and use the system to try to recover them,
it's right almost 85% of the time.  This can be further improved
somewhat by providing the system with an ontology of noun meanings (so
it can draw generalizations about words which don't occur in the
reference corpus but have very similar meanings to words which do).
No, it's never going to be right 100% of the time, at least until we
can read minds, but in most situations, very simple information about
the context is all that's needed.

A system like this has obvious applications for machine translation,
but the reason we first got to thinking about this problem was in the
context of an adaptive communication system.  We were working with an
ALS patient who was completely paralyzed:  he couldn't speak, move, or
even breathe on his own, but by moving his eyes he could spell out
simple messages.  This was very fatiguing for him, and the messages
tended to be highly telegraphic: "please get the car" might well come
out as "ge cr".  His family could understand what he meant, but no one
else could.  This program for generating articles was part of a larger
system to "translate" things like "ge cr" into fluent, polite English:
"please get the car".  You might think that this could only be done
reliably with full mind reading ability and/or a vast store of general
world knowledge, and it's easy to make up isolated examples where
that's true.   But, it turns out that in real life it can be done
remarkably well using very simple tricks.  So, yeah, if he'd ever
wanted to tell a valet to "please get a car", the system would have
inserted an unwanted "the".  Fortunately, hardly anyone ever does that,
so the problem doesn't come up very often.
---
Rob Malouf
rmalouf at mail.sdsu.edu
Department of Linguistics and Oriental Languages
San Diego State University



More information about the Funknet mailing list