[Corpora-List] Qualitative / Quantitative survey of Wikipedia dumps as Corpora
prosso at dsic.upv.es
prosso at dsic.upv.es
Fri Mar 22 17:09:40 UTC 2013
Btw, you can have a look also at the PAN-2012 competition on Quality
Flaw Prediction in Wikipedia
(http://www.webis.de/research/events/pan-12).
Nice w/e to everybody
Paolo
Def. Quota Imene Bensalem <bens.imene at gmail.com>:
> Hi liling;
>
> These are some papers I know :
>
> Anderka, M., Stein, B., Lipka, N.: Predicting quality flaws in
> user-generated content: The case of wikipedia. Proceedings of the 35th
> international ACM SIGIR conference on Research and development in
> information retrieval. pp. 981?990. ACM (2012).
>
> Anderka, M., Stein, B.: A breakdown of quality flaws in Wikipedia.
> Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality -
> WebQuality ?12. 11 (2012).
>
> And a technical report on Wikipedia :
> Pasternack, J., Roth, D.: The Wikipedia Corpus. (2008).
>
> I hope you find it interesting.
>
> Best,
>
> Imene
>
>
> ------------------------------
>>
>> Message: 7
>> Date: Wed, 13 Mar 2013 09:26:31 +0800
>> From: liling tan <alvations at gmail.com <javascript:;>>
>> Subject: [Corpora-List] Qualitative / Quantitative survey of Wikipedia
>> dumps as Corpora
>> To: corpora at uib.no <javascript:;>
>>
>> Dear all,
>>
>> Wikipedia dumps have been popular source of texts for NLP due to its
>> availability and the sheer size.
>>
>> I would like to ask whether anyone had conducted quantitative or
>> qualitative survey on
>>
>> - how useful are these dumps to NLP and
>> - what are the issues that will surface when using wikipedia dumps as
>> corpora.
>>
>>
>> Regards,
>> liling
>>
>>
>
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list