[Corpora-List] "Cargo cult" NLP?

Jason Eisner jason at cs.jhu.edu
Wed Apr 9 04:16:03 UTC 2014


Feynman's piece
<http://calteches.library.caltech.edu/51/02/CargoCult.pdf>is great,
and I often recommend it to students, as well as quoting my
favorite line ("The first principle is that you must not fool yourself --
and you are the easiest person to fool.")

But I think Noah is right: the particular problem Kevin mentions is not
usually an issue in our community.  Our equivalent to comparing two
implementations "in the same laboratory" is to compare their accuracy on
the same dataset, using the same metric.  It's a reasonable presumption in
computational experiments that other differences between labs won't affect
the accuracy.  We count on portability of the implementation, and assume
that if your implemented method is beating mine, it's not because my lab
has inferior machines (lossy memory, smaller word size, Pentium
floating-point division bug, improper cooling, buggy compiler ...).
Rather, we figure that my code would have done precisely as badly if run in
your lab.

Yes, there are situations where this argument doesn't apply:

* You're comparing speed rather than accuracy.  Speed isn't portable, so
speed comparisons should indeed be done on the same machine under the same
workload.
* You're comparing the accuracy not of two implementations, but of (e.g.)
two feature sets.  Then it's important to ensure that the two
implementations are matched in all respects other than the feature sets.

But people generally seem to recognize these issues and get the comparisons
right.

I think a more pressing problem is that we tend to overinterpret our
results.  Someone reports that implementation AA of method A does
significantly better than implementation BB of method B, when both are
trained on dataset D.  The comparison was performed on n samples from test
distribution P, using metric M.  But this hardly shows that A will do
better than B in other settings.  The advantage might not carry over to
other pairs of implementations, other training sets, other test
distributions, or other evaluation metrics.  The statistical significance
shows only that n was big enough (to reject the null hypothesis) when
everything else was held fixed.

This latter concern is more closely related to the traditional demand that
studies be replicated<http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0040028>.
Replication involves some degree of generalization, to determine whether
the claimed causes are still able to produce the effect in a new setting.
This is different from merely expecting results to be reproducible (which
they are if the code and data are saved)

regards, jason

On Tue, Apr 8, 2014 at 9:59 PM, Noah A Smith <nasmith at cs.cmu.edu> wrote:

> What are the "unknown ways" that one NLP researcher's conditions might
> differ from another NLP researcher's?  If you're empirically measuring
> runtime, you might have a point.  But if you're using a standardized
> dataset and automatic evaluation, it seems reasonable to report others'
> results for comparison.  Since NLP is much more about methodology than
> scientific hypothesis testing, it's not clear what the "experimental
> control" should be.  Is it really better to run your own implementation of
> the competing method?  (Some reviewers would likely complain that you might
> not have replicated the method properly!)  What about running the other
> researcher's code yourself?  I don't think that's fundamentally different
> from reporting others' results, unless you don't trust what they report.
>  Must I reannotate a Penn Treebank-style corpus every time I want to build
> a new parser?
>
> --
> Noah Smith
> Associate Professor
> School of Computer Science
> Carnegie Mellon University
>
>
> On Tue, Apr 8, 2014 at 6:57 PM, Kevin B. Cohen <kevin.cohen at gmail.com>wrote:
>
>> I was recently reading the Wikipedia page on "cargo cult science," a
>> concept attributed to no lesser a light than Richard Feynman.  I found this
>> on the page:
>>
>> "An example of cargo cult science is an experiment that uses another
>> researcher's results in lieu of an experimental control<http://en.wikipedia.org/wiki/Experimental_control>.
>> Since the other researcher's conditions might differ from those of the
>> present experiment in unknown ways, differences in the outcome might have
>> no relation to the independent variable<http://en.wikipedia.org/wiki/Independent_variable>under consideration. Other examples, given by Feynman, are from educational
>> research <http://en.wikipedia.org/wiki/Educational_research>, psychology<http://en.wikipedia.org/wiki/Psychology>(particularly
>> parapsychology <http://en.wikipedia.org/wiki/Parapsychology>), and
>> physics <http://en.wikipedia.org/wiki/Physics>. He also mentions other
>> kinds of dishonesty, for example, falsely promoting one's research to
>> secure funding."
>>
>> If we all had a dime for every NLP paper we've read that used "another
>> researcher's results in lieu of an experimental control," we wouldn't have
>> to work for a living.
>>
>> What do you think?  Are we all cargo cultists in this respect?
>>
>> http://en.wikipedia.org/wiki/Cargo_cult_science
>>
>> Kev
>>
>>
>> --
>> Kevin Bretonnel Cohen, PhD
>> Biomedical Text Mining Group Lead, Computational Bioscience Program,
>> U. Colorado School of Medicine
>> 303-916-2417
>> http://compbio.ucdenver.edu/Hunter_lab/Cohen
>>
>>
>>
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140409/7297f4e8/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list