[Corpora-List] Qualitative / Quantitative survey of Wikipedia dumps as Corpora

Fri Mar 22 14:12:44 UTC 2013

Hi  liling;

These are some papers I know :

Anderka, M., Stein, B., Lipka, N.: Predicting quality flaws in
user-generated content: The case of wikipedia. Proceedings of the 35th
international ACM SIGIR conference on Research and development in
information retrieval. pp. 981–990. ACM (2012).

Anderka, M., Stein, B.: A breakdown of quality flaws in Wikipedia.
Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality -
WebQuality  ’12. 11 (2012).

And a technical report on Wikipedia :
Pasternack, J., Roth, D.: The Wikipedia Corpus. (2008).

I hope you find it interesting.

Best,

Imene

------------------------------
>
> Message: 7
> Date: Wed, 13 Mar 2013 09:26:31 +0800
> From: liling tan <alvations at gmail.com <javascript:;>>
> Subject: [Corpora-List] Qualitative / Quantitative survey of Wikipedia
>         dumps   as Corpora
> To: corpora at uib.no <javascript:;>
>
> Dear all,
>
> Wikipedia dumps have been popular source of texts for NLP due to its
> availability and the sheer size.
>
> I would like to ask whether anyone had conducted quantitative or
> qualitative survey on
>
>    - how useful are these dumps to NLP and
>    - what are the issues that will surface when using wikipedia dumps as
>    corpora.
>
>
> Regards,
> liling
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130322/9b8028b4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora