<div>Hi  liling;</div><div><br></div><div>These are some papers I know :</div><div><br></div><div>Anderka, M., Stein, B., Lipka, N.: Predicting quality flaws in user-generated content: The case of wikipedia. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. pp. 981–990. ACM (2012).</div>

<div><br></div><div>Anderka, M., Stein, B.: A breakdown of quality flaws in Wikipedia. Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality  ’12. 11 (2012).</div><div><br></div><div>And a technical report on Wikipedia :</div>

<div>Pasternack, J., Roth, D.: The Wikipedia Corpus. (2008).</div><div><br></div><div>I hope you find it interesting.</div><div><br></div><div>Best,</div><div><br></div><div>Imene</div><div><br></div><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


------------------------------<br>

<br>

Message: 7<br>

Date: Wed, 13 Mar 2013 09:26:31 +0800<br>

From: liling tan <<a href="javascript:;" onclick="_e(event, 'cvml', 'alvations@gmail.com')">alvations@gmail.com</a>><br>

Subject: [Corpora-List] Qualitative / Quantitative survey of Wikipedia<br>

        dumps   as Corpora<br>

To: <a href="javascript:;" onclick="_e(event, 'cvml', 'corpora@uib.no')">corpora@uib.no</a><br>

<br>

Dear all,<br>

<br>

Wikipedia dumps have been popular source of texts for NLP due to its<br>

availability and the sheer size.<br>

<br>

I would like to ask whether anyone had conducted quantitative or<br>

qualitative survey on<br>

<br>

   - how useful are these dumps to NLP and<br>

   - what are the issues that will surface when using wikipedia dumps as<br>

   corpora.<br>

<br>

<br>

Regards,<br>

liling<br><br>

</blockquote>