[Corpora-List] Guidelines for creating Gold Standard Alignment

mohnish jadwani mohnishgj at gmail.com
Fri Apr 16 05:20:43 UTC 2010


Respected Readers,
The need to create a Gold Standard Alignment of vital importance when one
has to evaluate results of bilingual corpus given to *word alignment* tools
like Giza++. This Gold Standard Alignment( Test Data ) as many of us know
serves as a reference against which one can evaluate the results obtained
using the Training data. For the creation of this test data which is a
subset of the Training Data, when one goes about it manually,  an individual
comes across lot of variations with respect source and target languages
while aligning words for e.g


1# 5 # does(1) he(2) go(3) home(4) ?(5) # 4 2 4 3 0

1# 5 # क्या(1) वह(2) घर(3) जाता(4) है(5) # 0 2 4 3 0

the word "does" maps to 'ता' of 'जाता'

There are many such careful considerations one has to keep in mind while
going about creation of Gold Standard Alignment.

Could you please suggest me any basic guidelines( if not
English-Hindi language specific ) that one could follow while going about
this, any reference paper or advice would be of great help.

Thanking You

Mohnish
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20100416/98f0bafb/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list