The Chinese Diplomat's "the" (3)

Tue Aug 31 14:40:55 UTC 2004

In a message dated 8/30/04 4:37:17 PM, language at sprynet.com writes:
<< Contrary to Steve's fantasies that all language can be broken down to Roger

Schank-like scenarios involving dialogues with car valets, both grammar and

accent really do matter in most languages. >>

Well, obviously a problem with my scenario would be that it gave Alex the
impression that I was saying grammar and accent don't matter.

(Reminded me of one of the more memorable Roger Schank lines: "People don't
remember what you say. They remember what they say.")

One of my points was that there are actually two different kinds of "bad
grammar."  There's one kind that makes my speech incomprehensible to listeners.
There's another kind that sounds wrong "grammatically" but is nevertheless
understandable by listeners.
(Time for more scenarios.)

A child recently told me that he "waked up in the morning..."  I corrected
him but understood what he was saying.  That's bad grammar that doesn't directly
interfere with communication, except to the extent that it distracts or
affects the willingness of the listener to listen.

However, the Chinese diplomat scenario appears to teach us that whether
grammar is faulty can often depend on non-linguistic factors (i.e., whether the
embassy owns many cars or just one car -- ie, "get a car" or "get the car").
Some sociolinguists have had a habit of calling these non-linguistic factors
"context", in the sense of surrounding circumstances.  But the fact is they are
the core reason we are speaking in the first place.  If our diplomat has no
interest in cars, he should logically have nothing to say and the correct article
and other grammar problems do not arise.

What Rob originally wrote was: "At any rate, the performance of the best
[computer] models is getting close to that of humans at guessing which article
will be used in a given context."

What I was challenging in that statement was how a computer could know
"context" -- the non-linguistic ingredients in the soup.  From what I can tell, the
computer thinks "get the car" is more likely than "get a car" because "get the
car" or something like it has been more likely in the past.  This is not
"context" in the sense of reference, which involves non-linguistic factors.  It's
"context" in the sense of word sequence and adjacency history and contraints
on sentence structure.  That's an important difference in terminology and one I
thought worth mentioning.  It seems to confuse the computer generated
language issues a lot.

Particularly because "a car" versus "the car" is NOT always a matter that can
be solved without looking outside language and in the real world.  The
parking valet teaches us that.  A machine cannot solve that problem on its own.  It
just doesn't know whether " a car" or "the car" is correct in that
circumstance.  It doesn't know whether the diplomat should choose one or the other.  And
of course we can't say which is correct unless we also have such knowledge.

Alex also writes:
<<... just as i am concerned with ...breaking through to describing how
language actually works. >>

Let me suggest a place to start.  A friend recently received a phone message
from a colleague with a strong Southern accent.  She and I could make out at
best five words out of two dozen.  We're all competent native English speakers,
but the message to us was incomprehensible.  That's an example of when
language "actually doesn't work" though it should.  Let me suggest that explaining
why it didn't work might go a long way towards explaining how it works, when it
does work.

BTW, there's a humorous piece on the web about "the THE" by Peter Master at:
http://aaal.lang.uiuc.edu/letter/23.2/theology.html

Regards,
Steve Long