[Corpora-List] "Cargo cult" NLP?

Noah A Smith nasmith at cs.cmu.edu
Wed Apr 9 01:59:57 UTC 2014


What are the "unknown ways" that one NLP researcher's conditions might
differ from another NLP researcher's?  If you're empirically measuring
runtime, you might have a point.  But if you're using a standardized
dataset and automatic evaluation, it seems reasonable to report others'
results for comparison.  Since NLP is much more about methodology than
scientific hypothesis testing, it's not clear what the "experimental
control" should be.  Is it really better to run your own implementation of
the competing method?  (Some reviewers would likely complain that you might
not have replicated the method properly!)  What about running the other
researcher's code yourself?  I don't think that's fundamentally different
from reporting others' results, unless you don't trust what they report.
 Must I reannotate a Penn Treebank-style corpus every time I want to build
a new parser?

--
Noah Smith
Associate Professor
School of Computer Science
Carnegie Mellon University


On Tue, Apr 8, 2014 at 6:57 PM, Kevin B. Cohen <kevin.cohen at gmail.com>wrote:

> I was recently reading the Wikipedia page on "cargo cult science," a
> concept attributed to no lesser a light than Richard Feynman.  I found this
> on the page:
>
> "An example of cargo cult science is an experiment that uses another
> researcher's results in lieu of an experimental control<http://en.wikipedia.org/wiki/Experimental_control>.
> Since the other researcher's conditions might differ from those of the
> present experiment in unknown ways, differences in the outcome might have
> no relation to the independent variable<http://en.wikipedia.org/wiki/Independent_variable>under consideration. Other examples, given by Feynman, are from educational
> research <http://en.wikipedia.org/wiki/Educational_research>, psychology<http://en.wikipedia.org/wiki/Psychology>(particularly
> parapsychology <http://en.wikipedia.org/wiki/Parapsychology>), and physics<http://en.wikipedia.org/wiki/Physics>.
> He also mentions other kinds of dishonesty, for example, falsely promoting
> one's research to secure funding."
>
> If we all had a dime for every NLP paper we've read that used "another
> researcher's results in lieu of an experimental control," we wouldn't have
> to work for a living.
>
> What do you think?  Are we all cargo cultists in this respect?
>
> http://en.wikipedia.org/wiki/Cargo_cult_science
>
> Kev
>
>
> --
> Kevin Bretonnel Cohen, PhD
> Biomedical Text Mining Group Lead, Computational Bioscience Program,
> U. Colorado School of Medicine
> 303-916-2417
> http://compbio.ucdenver.edu/Hunter_lab/Cohen
>
>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140408/2c991239/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list