[Corpora-List] Standard defenition for Comparable corpora

S. Bakhshaei s.bakhshaei at gmail.com
Fri Aug 8 06:43:53 UTC 2014


Thanks all,

But I am actually looking for some classification like this:
comparable corpora are defined as collection of:

1. texts that are written about a same event, news, ...
2. texts that are written about a same topic in a same period of time,
3. texts that only have same topic,

....

Are you aware of this kind of classification?
Thank you,




On Fri, Aug 8, 2014 at 7:24 AM, Dekai Wu <dekai at cs.ust.hk> wrote:

>  Hi Somayeh, on a related discussion recently I posted a pointer to
> http://www.cs.ust.hk/~dekai/library/WU_Dekai/nonparallel.html where you
> can find an HTML table summarizing the differences between four different
> degrees of (non)parallel corpora, synthesized from some surveys within
> papers by Pascale Fung, including:
>
> 1. parallel corpus
> 2. noisy parallel corpus
> 3. comparable corpus
> 4. quasi-comparable (very-non-parallel) corpus
>
> References
>
> Pascale Fung & Percy Cheung (2004). Mining very-non-parallel corpora:
> Parallel sentence and lexicon extraction via bootstrapping and EM. In
> Dekang Lin and Dekai Wu (editors), Proceedings of the 2004 Conference on
> Empirical Methods in Natural Language Processing (EMNLP 2004). Barcelona,
> Spain: July 2004.
>
> Pascale Fung & Percy Cheung (2004). Multi-level bootstrapping for
> extracting parallel sentences from a quasi-comparable Corpus. In
> Proceedings of the 20th International Conference on Computational
> Linguistics (COLING 2004). Geneva, Switzerland: August 2004.
>
> Dekai Wu & Pascale Fung (2005). Inversion Transduction Grammar constraints
> for mining parallel sentences from quasi-comparable corpora. In Proceedings
> of the Second International Joint Conference on Natural Language Processing
> (IJCNLP 2005), Lecture Notes in Computer Science 3651: 257-268.
>
>
> Hope this helps!
> -Dekai
> --
> Dekai Wu
> Hong Kong University of Science & Technology (HKUST)
> Human Language Technology Center
> Department of Computer Science and Engineering
>
>
> S. Bakhshaei wrote:
>
>  Hello all,
>
>  As you know comparable corpora are differ according to the degree of
> comparability of their contents. I want to know if there is a standard
> definition/classification for them? Can anyone guide me to a reference
> paper please?
>
>
>  Best Regards,
> Somayeh Bakhshaei
>
>
> ---------------------
> Best Regards,
> Somayeh Bakhshaei
>
> After All you will come ....
> And will spread light on the dark desolate world!
> O' Kind Father! We will be waiting for your affectionate hands ...
>
> ------------------------------
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing listCorpora at uib.nohttp://mailman.uib.no/listinfo/corpora
>
>
>


-- 



---------------------
Best Regards,
Somayeh Bakhshaei

After All you will come ....
And will spread light on the dark desolate world!
O' Kind Father! We will be waiting for your affectionate hands ...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140808/16c5a466/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list