[Corpora-List] Standard defenition for Comparable corpora
Dekai Wu
dekai at cs.ust.hk
Fri Aug 8 02:54:06 UTC 2014
Hi Somayeh, on a related discussion recently I posted a pointer to
http://www.cs.ust.hk/~dekai/library/WU_Dekai/nonparallel.html
<http://www.cs.ust.hk/%7Edekai/library/WU_Dekai/nonparallel.html> where
you can find an HTML table summarizing the differences between four
different degrees of (non)parallel corpora, synthesized from some
surveys within papers by Pascale Fung, including:
1. parallel corpus
2. noisy parallel corpus
3. comparable corpus
4. quasi-comparable (very-non-parallel) corpus
References
Pascale Fung & Percy Cheung (2004). Mining very-non-parallel corpora:
Parallel sentence and lexicon extraction via bootstrapping and EM. In
Dekang Lin and Dekai Wu (editors), Proceedings of the 2004 Conference on
Empirical Methods in Natural Language Processing (EMNLP 2004).
Barcelona, Spain: July 2004.
Pascale Fung & Percy Cheung (2004). Multi-level bootstrapping for
extracting parallel sentences from a quasi-comparable Corpus. In
Proceedings of the 20th International Conference on Computational
Linguistics (COLING 2004). Geneva, Switzerland: August 2004.
Dekai Wu & Pascale Fung (2005). Inversion Transduction Grammar
constraints for mining parallel sentences from quasi-comparable corpora.
In Proceedings of the Second International Joint Conference on Natural
Language Processing (IJCNLP 2005), Lecture Notes in Computer Science
3651: 257-268.
Hope this helps!
-Dekai
--
Dekai Wu
Hong Kong University of Science & Technology (HKUST)
Human Language Technology Center
Department of Computer Science and Engineering
S. Bakhshaei wrote:
> Hello all,
>
> As you know comparable corpora are differ according to the degree of
> comparability of their contents. I want to know if there is a standard
> definition/classification for them? Can anyone guide me to a reference
> paper please?
>
>
> Best Regards,
> Somayeh Bakhshaei
>
>
> ---------------------
> Best Regards,
> Somayeh Bakhshaei
>
> After All you will come ....
> And will spread light on the dark desolate world!
> O' Kind Father! We will be waiting for your affectionate hands ...
> ------------------------------------------------------------------------
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140808/82922e06/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list