[Corpora-List] please introduce some easy to use and well-known sentence aligner tools

Gilda Tataei guilda_t at yahoo.com
Mon May 16 23:32:20 UTC 2011


Dear Saeed,
 
You can use YouAlign, a web-based aligner available at:
 
http://www.youalign.com/
 
I got satisfactory results using this aligner for my project; aligning English-Persian segments in the legal domain. It supports data in almost every format (*.ppt, *.pdf, *.doc, etc.) and does not limit you to *.txt only. Then, depending on your data's features and the nature of the work maybe you can try different formats and find the most appropriate one for your input data.
 
The project I am doing involves building an English-Persian parallel corpus and should be available in June; you may be able to find some further details in it with regards to your research which you might find useful. 
 
Regards,
G.Tataei
 

--- On Sat, 5/14/11, Grzegorz Chrupała <pitekus at gmail.com> wrote:

 

From: Grzegorz Chrupała <pitekus at gmail.com>
Subject: Re: [Corpora-List] please introduce some easy to use and well-known sentence aligner tools
To: "saeed farzi" <saeedfarzi at gmail.com>
Cc: corpora at uib.no
Date: Saturday, May 14, 2011, 8:03 AM


Dear Saeed,

You could try hunalign:
http://mokk.bme.hu/resources/hunalign/

Despite the name, it's not limited to Hungarian.

It is a command-line tool and probably not what you have in mind when
you say "easy to use" but its perfectly usable once you have read the
documentation.

If you want something with a GUI you will probably need to look into
commercial options such as Trados.

Hope this helps.
--
Grzegorz Chrupala
Saarland University
FR 7.4 Spoken Language Systems
Building C7 1, Room 0.04
66041 Saarbrücken
+49 681 302 58126
gchrupala at lsv.uni-saarland.de




On Fri, May 13, 2011 at 21:38, saeed farzi <saeedfarzi at gmail.com> wrote:
> Hi guys,
> I am looking for a good sentence alignment tool to apply on
> Persian-English parallel corpus. please introduce some easy to use and
> well-known sentence aligner tools that I can use.
> Saeed Farzi
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110516/07d4aaa1/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list