[Corpora-List] Bootstrap in linguistics

Yuval Marton yuvalmarton at gmail.com
Fri Oct 14 16:23:02 UTC 2011


Dear Chris,
Following the same logic and disclaimer of Jin-Dong, you might find Koehn's
method for stat. sig. testing for machine translation output useful:

Philipp Koehn. 2004. Statistical significance tests for
machine translation evaluation. In Proceedings of the
Conference on Empirical Methods in Natural Language
Processing (EMNLP).


There's a freely available implementation and source code of it, too. (Not
sure about the exact license).


Best,

-Yuval




On Fri, Oct 14, 2011 at 4:09 AM, Jin-Dong Kim <jdkim at dbcls.rois.ac.jp>wrote:

> Dear Chris,
>
> I am not sure if you consider it as a corpus linguistics study, but
> bootstrap resampling techniques were indeed used in this work:
>
> @article{Sang:2002:MSP:944790.944818,
>  author = {Sang, Erik F. Tjong Kim},
>  title = {Memory-based shallow parsing},
>  journal = {J. Mach. Learn. Res.},
>  volume = {2},
>  month = {March},
>  year = {2002},
>  issn = {1532-4435},
>  pages = {559--594},
>  numpages = {36},
>  url = {http://dl.acm.org/citation.cfm?id=944790.944818},
>  acmid = {944818},
>  publisher = {JMLR.org},
>  keywords = {feature selection, memory-based learning, shallow
> parsing, system combination},
> }
>
> Hope it helps.
>
> Best,
>
> Jin-Dong
>
> On Thu, Oct 13, 2011 at 11:43 PM,  <CRuehlemann at aol.com> wrote:
> > Dear all,
> >
> >
> >
> > It is not uncommon in quantitative corpus linguistic studies that a
> > significance test cannot be performed either because one cannot juxtapose
> > the distribution of a variable against the distribution of another
> > comparable variable or against a specific distribution (e.g. normal
> > distribution, exponential, etc.) or against an a priory stipulated value.
> To
> > nonetheless assess whether the distribution in the sample is simply due
> to
> > chance or a reflection of the true distribution in the population,
> > statisticians often use the bootstrap method. This method is a resampling
> > method: from the sample, a large number of resamples are drawn randomly
> and
> > with replacement.
> >
> >
> >
> > Is anyone aware of any (corpus) linguistic study/studies which has/have
> used
> > this method?
> >
> >
> >
> > Many thanks in advance
> >
> >
> >
> > Chris
> >
> > _______________________________________________
> > UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >
> >
>
>
>
> --
> Jin-Dong Kim, Ph.D,
> Project Associate Professor,
> Database Center for Life Science (DBCLS),
> Research Organization of Information and Systems (ROIS)
> home: http://dbcls.rois.ac.jp/~jdkim
> e-mail: jdkim at dbcls.rois.ac.jp
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20111014/88dd80dc/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list