[Corpora-List] Standard defenition for Comparable corpora

Dekai Wu dekai at cs.ust.hk
Fri Aug 8 02:54:06 UTC 2014


Hi Somayeh, on a related discussion recently I posted a pointer to 
http://www.cs.ust.hk/~dekai/library/WU_Dekai/nonparallel.html 
<http://www.cs.ust.hk/%7Edekai/library/WU_Dekai/nonparallel.html> where 
you can find an HTML table summarizing the differences between four 
different degrees of (non)parallel corpora, synthesized from some 
surveys within papers by Pascale Fung, including:

1. parallel corpus
2. noisy parallel corpus
3. comparable corpus
4. quasi-comparable (very-non-parallel) corpus

References

Pascale Fung & Percy Cheung (2004). Mining very-non-parallel corpora: 
Parallel sentence and lexicon extraction via bootstrapping and EM. In 
Dekang Lin and Dekai Wu (editors), Proceedings of the 2004 Conference on 
Empirical Methods in Natural Language Processing (EMNLP 2004). 
Barcelona, Spain: July 2004.

Pascale Fung & Percy Cheung (2004). Multi-level bootstrapping for 
extracting parallel sentences from a quasi-comparable Corpus. In 
Proceedings of the 20th International Conference on Computational 
Linguistics (COLING 2004). Geneva, Switzerland: August 2004.

Dekai Wu & Pascale Fung (2005). Inversion Transduction Grammar 
constraints for mining parallel sentences from quasi-comparable corpora. 
In Proceedings of the Second International Joint Conference on Natural 
Language Processing (IJCNLP 2005), Lecture Notes in Computer Science 
3651: 257-268.


Hope this helps!
-Dekai
-- 
Dekai Wu
Hong Kong University of Science & Technology (HKUST)
Human Language Technology Center
Department of Computer Science and Engineering


S. Bakhshaei wrote:
> Hello all,
>
> As you know comparable corpora are differ according to the degree of 
> comparability of their contents. I want to know if there is a standard 
> definition/classification for them? Can anyone guide me to a reference 
> paper please?
>
>
> Best Regards,
> Somayeh Bakhshaei
>
>
> ---------------------
> Best Regards,
> Somayeh Bakhshaei
>
> After All you will come ....
> And will spread light on the dark desolate world!
> O' Kind Father! We will be waiting for your affectionate hands ...
> ------------------------------------------------------------------------
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140808/82922e06/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list