[Corpora-List] Qualitative / Quantitative survey of Wikipedia dumps as Corpora
Imene Bensalem
bens.imene at gmail.com
Fri Mar 22 14:12:44 UTC 2013
Hi liling;
These are some papers I know :
Anderka, M., Stein, B., Lipka, N.: Predicting quality flaws in
user-generated content: The case of wikipedia. Proceedings of the 35th
international ACM SIGIR conference on Research and development in
information retrieval. pp. 981–990. ACM (2012).
Anderka, M., Stein, B.: A breakdown of quality flaws in Wikipedia.
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality -
WebQuality ’12. 11 (2012).
And a technical report on Wikipedia :
Pasternack, J., Roth, D.: The Wikipedia Corpus. (2008).
I hope you find it interesting.
Best,
Imene
------------------------------
>
> Message: 7
> Date: Wed, 13 Mar 2013 09:26:31 +0800
> From: liling tan <alvations at gmail.com <javascript:;>>
> Subject: [Corpora-List] Qualitative / Quantitative survey of Wikipedia
> dumps as Corpora
> To: corpora at uib.no <javascript:;>
>
> Dear all,
>
> Wikipedia dumps have been popular source of texts for NLP due to its
> availability and the sheer size.
>
> I would like to ask whether anyone had conducted quantitative or
> qualitative survey on
>
> - how useful are these dumps to NLP and
> - what are the issues that will surface when using wikipedia dumps as
> corpora.
>
>
> Regards,
> liling
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130322/9b8028b4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list