[Corpora-List] Developing Parallel Corpus

Darren Cook darren at dcook.org
Fri Apr 18 10:32:43 UTC 2014


> I am currently doing MS, and for my final research I wanted to develop
> the parallel corpus. I have translation of source and target language.
> What else I have to do in order to develop the parallel corpus? Should I
> have to tokenize this data? or any other processing on this text?

An article [1] in the first issue of Journal of Language Modelling gave
a nice overview of what corpora and parallel corpora cover.
(I just happened to have read and enjoyed this article recently, which
is why it came to mind; I'm sure there are other articles on the subject
to be found.)

Darren

[1]: http://jlm.ipipan.waw.pl/index.php/JLM/article/view/33
The Bulgarian National Corpus: Theory and Practice in Corpus Design


-- 
Darren Cook, Software Researcher/Developer
My new book: Data Push Apps with HTML5 SSE
Published by O'Reilly: (ask me for a discount code!)
  http://shop.oreilly.com/product/0636920030928.do
Also on Amazon and at all good booksellers!

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list