[Corpora-List] Summary: GUI for Word Alignment

Pete Whitelock pete.whitelock at sharp.co.uk
Thu Oct 7 10:47:45 UTC 2004


On 22nd September, I posted the query:

>> What's the state of the art in GUIs allowing translators to develop gold standard word aligned bilingual corpora? Is there anything publicly available?
 
>> Particularly interesting would be software that takes in an automatically generated alignment and allows the user to patch it up. 

>> Also, because I'm interested in aligning a head final and a head initial language pair, software that shows alignments in color rather than by lines would be optimal.

Here's a summary of the replies (I haven't included the details of input and output representations, which are easy enough to massage, but I've reported the type of display where I could).

Rada Mihalcea maintains a page of links to Word (and Sentence) Alignment tools and resources at http://www.cs.unt.edu/~rada/wa

Noah Smith (nasmith at gmail.com) developed a visualisation tool, Cairo, with Mike Jahr during EGYPT, the 1999 Statistical MT Workshop at Johns Hopkins. It's in Java and displays alignments with lines linking words in the two languages. Currently it doesn't allow alignments to be modified but could be extended to do that. It's downloadable from http://www.clsp.jhu.edu/ws99/projects/mt/toolkit/
and you can see what it looks like at http://www.clsp.jhu.edu/ws99/projects/mt/report/1/9.gif

Ted Pederson (tpederse at d.umn.edu) 's Alpaco, with a  similar line-based display, is available at http://www.d.umn.edu/~tpederse/parallel.html. It's written in Perl and Tk and allows new alignments to be specified.

Hal Daume of ISI (hdaume at ISI.EDU) wrote HandAlign, a similar tool for aligning articles and their summaries, available at  http://www.isi.edu/~hdaume/HandAlign/. It's in Java, and again produces line-based display, but the two texts being aligned are independently scrollable.

Magnus Merkel (magme at ida.liu.se) and his colleagues at Linköping have developed an interactive word aligner(I*Link)  written in Java and which displays alignments with color-coding. You can download an academic version from http://www.ida.liu.se/~nlplab/ILink/. A screenshot is attached (ilink.gif)

Jorg Tiedeman (tiedeman at let.rug.nl) has implemented a demo web-interface in Perl for handling parallel corpora, with the possibility of editing automatically word-aligned corpora. You have to register before you can use your own corpora. http://stp.ling.uu.se/cgi-bin/joerg/Uplug

Phillip Koehn (koehn at csail.mit.edu) has also implemented a web-based tool, an example of which is viewable at http://montev.isi.edu:8000/align-tool/?CORPUS=de-news-morphix&AFILE=full-model1-50-50.gz. Alignments are displayed in matrix format with checkboxes that can be set or cleared.

Chris Callison-Birch (callison-burch at ed.ac.uk) of Linear B (http://linearb.co.uk) also has available a matrix display alignment tool with colored grid squares representing 'sure' or 'probable' alignments. It also allows output of a list of phrases that can be extracted from the word alignments. A screenshot is attached (linearB.tiff)

Interested readers should consult Rada Mihalcea's web page for further links, including one to Patrick Lambert's Lingua-AlignmentSet toolkit in Perl for handling word alignments (http://www.lsi.upc.es/~lambert/software/AlignmentSet.html). This allows display in matrix format (line format will be implemented in the future), conversion between different representations and evaluations against a gold standard.

Attatchments:

http://helmer.hit.uib.no/corpora/ilink.gif
http://helmer.hit.uib.no/corpora/linearB.tif



More information about the Corpora mailing list