[Ads-l] Computational Linguistics: ChatGPT versus a well-known ball and bat word problem; also another algebra problem
ADSGarson O'Toole
adsgarsonotoole at GMAIL.COM
Thu Dec 15 22:43:44 UTC 2022
Nobel Laureate Daniel Kahneman’s 2011 book "Thinking, Fast and Slow"
presented the following algebra puzzle which confounded a majority of
U.S. undergraduates.
[Begin excerpt]
For an example, here is a simple puzzle. Do not try to solve it but
listen to your intuition:
A bat and ball cost $1.10. The bat costs one dollar more than the
ball. How much does the ball cost?
A number came to your mind. The number, of course, is 10: 10¢. The
distinctive mark of this easy puzzle is that it evokes an answer that
is intuitive, appealing, and wrong. Do the math, and you will see. If
the ball costs 10¢, then the total cost will be $1.20 (10¢ for the
ball and $1.10 for the bat), not $1.10. The correct answer is 5¢. It
is safe to assume that the intuitive answer also came to the mind of
those who ended up with the correct number—they somehow managed to
resist the intuition.
. . .
Many thousands of university students have answered the bat-and-ball
puzzle, and the results are shocking. More than 50% of students at
Harvard, MIT, and Princeton gave the intuitive—incorrect—answer. At
less selective universities, the rate of demonstrable failure to check
was in excess of 80%.
[End excerpt]
A recent article on Slate by Daniel Lametti mentioned that ChatGPT,
the latest large language model from OpenAI, provided an incorrect
answer when it was asked to solve the bat and ball problem above.
However, I found that ChaptGPT can solve the problem if you give it
feedback by saying its initial solution is wrong. Here is the
transcript.
My first prompt to ChatGPT: A bat and ball cost $1.10. The bat costs
$1 more than the ball. How much does the ball cost?
[Begin first response from ChatGPT]
If a bat costs $1 more than the ball, then the ball costs $1 less than
the bat. Since the bat costs $1.10, the ball would cost $1.10 - $1 =
$0.10.
[End first response from ChatGPT]
My second prompt to ChatGPT: That is incorrect. Try again.
[Begin second response from ChatGPT]
Second response of ChatGPT: I apologize if my response was not
helpful. To find the cost of the ball, we can use the fact that the
bat and ball together cost $1.10. Since the bat costs $1 more than the
ball, we can say that the ball costs $x and the bat costs $x + $1. We
can then set up the following equation to find the cost of the ball:
$x + ($x + $1) = $1.10
2$x + $1 = $1.10
2$x = $1.10 - $1
2$x = $0.10
$x = $0.10 / 2
$x = $0.05
Therefore, the ball costs $0.05.
[End second response from ChatGPT]
ChatGPT’s analysis above is correct. It seems ChatGPT has capabilities
that can be revealed via deeper scrutiny. I was able to coax ChatGPT
to solve a more complex word problem by fine-tuning the formulation
and making multiple prompts. Most of my prompts failed, but the
following succeeded.
[Begin my prompt to ChatGPT]
There are P pigs and C chickens on a farm.
The total number of legs is 116.
The total number of pigs and chickens is 45.
Please use two equations and two unknowns.
How many pigs are there?
[End my prompt to ChatPGPT]
[Begin response from ChatGPT]
If there are 45 pigs and chickens on a farm, we can represent the
number of pigs and chickens using the variables P and C. Since we know
that there are 116 legs in total, we can set up the following two
equations to represent the situation:
P + C = 45
2 * C + 4 * P = 116
We can then solve for the number of pigs and chickens by using a
method called substitution. First, we can isolate the term for the
number of pigs in the first equation and solve for P:
P = 45 - C
Next, we can substitute the expression we found for P into the second
equation to get an equation that only contains the variable C:
2 * C + 4 * (45 - C) = 116
2 * C + 180 - 4 * C = 116
-2 * C = -64
C = 32
Finally, we can substitute the value we found for C into the first
equation to find the number of pigs:
P = 45 - 32
P = 13
Thus, there are 13 pigs on the farm.
[End response from ChatGPT]
The analysis above is correct. I tried to coax the system into solving
a word problem with three equations and three unknowns, but the system
would time-out, i.e. run out of time before it came to a conclusion.
Garson
------------------------------------------------------------
The American Dialect Society - http://www.americandialect.org
More information about the Ads-l
mailing list