[Corpora-List] Qualitative / Quantitative survey of Wikipedia dumps as Corpora

prosso at dsic.upv.es prosso at dsic.upv.es
Fri Mar 22 17:09:40 UTC 2013


Btw, you can have a look also at the PAN-2012 competition on Quality  
Flaw Prediction in Wikipedia  
(http://www.webis.de/research/events/pan-12).

Nice w/e to everybody

Paolo


Def. Quota Imene Bensalem <bens.imene at gmail.com>:

> Hi  liling;
>
> These are some papers I know :
>
> Anderka, M., Stein, B., Lipka, N.: Predicting quality flaws in
> user-generated content: The case of wikipedia. Proceedings of the 35th
> international ACM SIGIR conference on Research and development in
> information retrieval. pp. 981?990. ACM (2012).
>
> Anderka, M., Stein, B.: A breakdown of quality flaws in Wikipedia.
> Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality -
> WebQuality  ?12. 11 (2012).
>
> And a technical report on Wikipedia :
> Pasternack, J., Roth, D.: The Wikipedia Corpus. (2008).
>
> I hope you find it interesting.
>
> Best,
>
> Imene
>
>
> ------------------------------
>>
>> Message: 7
>> Date: Wed, 13 Mar 2013 09:26:31 +0800
>> From: liling tan <alvations at gmail.com <javascript:;>>
>> Subject: [Corpora-List] Qualitative / Quantitative survey of Wikipedia
>>         dumps   as Corpora
>> To: corpora at uib.no <javascript:;>
>>
>> Dear all,
>>
>> Wikipedia dumps have been popular source of texts for NLP due to its
>> availability and the sheer size.
>>
>> I would like to ask whether anyone had conducted quantitative or
>> qualitative survey on
>>
>>    - how useful are these dumps to NLP and
>>    - what are the issues that will surface when using wikipedia dumps as
>>    corpora.
>>
>>
>> Regards,
>> liling
>>
>>
>



----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list