[Corpora-List] Aligner for ParaConc? - summary

Michael Barlow barlow at ruf.rice.edu
Tue Sep 3 15:33:57 UTC 2002


All,

I know that I have said this before, but version 1.0 of ParaConc will soon
be ready. This version will align texts by asking the user about the
format of (a) headings, (b) paragraphs and (c) sentences. (Sentence
alignment may also be carried out using the Gale-Church algorithm.) The
result is a colour-coded alignment, which the user can work on further by
splitting or merging sentences and paragraphs to get the correct
alignment. (I have just uploaded a rather poor screen shot of this
semi-automatic aligner to http://athel.com/aligner.htm )

At the moment, the program works with parallel texts that are in separate
files. I will need to add another option to work either with Pernilla's
vanilla aligner or a TMX-style file structure, but I don't think I will
get this into version 1.

Note: ParaConc is commercial software that I have an interest in.

Best,

Michael
----------------------------------------------------------------------
Michael Barlow,      Department of Linguistics,       Rice University
barlow at rice.edu				      www.ruf.rice.edu/~barlow
barlow at athel.com                                         www.athel.com



On Tue, 3 Sep 2002, Sampo Nevalainen wrote:

> Dear all,
>
> Some time ago I asked for an aligner that could be used with ParaConc. I
> got two replies and a request for a summary. Unfortunately I do not have
> time for a proper summary, instead I have attached the original message and
> the replies I got . I would like to thank Martin Wynne
> and Raphael Salkie for their assistance.
>
> By the way, after I had sent my request I got to know about a free TM
> software called Wordfast. The program is fully integrated into MS Word and
> for me it seems an exellent tool, considering it is a freeware. (This is
> not a paid advertisement, just my personal opinion!) Wordfast  has got an
> add-on called +Tools, which includes an aligner, also based on MS Word. The
> aligner automates some things that you should do manually in Word (such as
> breaking text into sentences and line numbering), but I am afraid the
> aligning method is not too intelligent: a lot of work must be done manually
> anyways. However, it is one possibility worth mentioning. And, for other
> corpus fans and enthusiasts, Wordfast is provided with a pretty fast but
> modest concordancer, too :-)  Both Wordfast and +Tools can be downloaded
> from the following URL:  http://www.champollion.net/
>
> sincerely,
> sampo
>
> The original message below:
> -------------------------------------------------------------------------------------------------------------------------------
> I wonder if there is any (freely available) alignment tools to be used with
> ParaConc? That is, the aligner should let users save the original and
> target texts into separate files. I know there is an aligner in the WS
> Tools pack, but for some reason the program tends to "re-join" the
> sentences you already "un-joined"... Well, you can use the WSTools Aligner
> if you get the job done at once, in one go, without saving and re-opening
> the files. (I don't know whether it's my fault - I cannot use the program
> correctly - or there's a bug in the prog.) I also know there are alignment
> tools for "filling up" translation memories (e.g. Trans Suite 2000 Align,
> which is distributed freely), but they seem not to have an option of saving
> the source and the target texts into separate files. Ok, I could save the
> output file as a text file with a separator between the segments, then open
> it to Excel using these separators as column separators, and, finally, save
> each column as a separate text file... but this makes a simple task too
> complicated, IMHO.  So, could someone help me to find out an aligner
> (preferably Windows GUI, to be used in a classroom) that would simply split
> the texts into sentences and let the user correct the alignment by joining
> and unjoining sentences? The program should then save the files into
> separate (ascii) text files. Many thanks in advance for your tips and advice!
> ----------------------------------------------------------------------------------------------------------------------------------
>
> -----------------------------------------------------------------------------------------------------
> From: Martin Wynne <martin.wynne at ota.ahds.ac.uk>
> To: "'Sampo Nevalainen'" <samponev at cc.joensuu.fi>
> -----------------------------------------------------------------------------------------------------
> I have used a simple Perl aligner written by Pernilla Danielsson and Daniel
> Ridings. When I taught with pernilla on a course at the Tuscan Word Centre
> we used this program (which she calls the "vanilla aligner") to align texts
> specifically to use with ParaConc, so I know it can do this job. We may
> have done a bit of tweaking on the output. You can contact her on
> pernilla at ccl.bham.ac.uk.
> best,
> Martin
>
>
> -----------------------------------------------------------------------------------------------------------------
> From: R.M.Salkie at bton.ac.uk
> To: samponev at cc.joensuu.fi
> ------------------------------------------------------------------------------------------------------------------
> I've been struggling with the same problem, including using Trans Suite
> 2000 Align. I don't have a good answer, just two suggestions.
> Firstly, it's possible to use the replace function in Word using the output
> of Trans Suite, saved in TMX format. This is what a typical pair of
> sentences looks like:
> <tu
> creationdate="20020723T151150Z"
> creationid="TS2!ALIGN"
> changedate="20020723T151150Z"
>  >
> <tuv lang="EN-GB">
> <seg>World consumption has expanded at an unprecedented pace over the 20th
> century, with private and public consumption expenditures reaching $24
> trillion in 1998, twice the level of 1975 and six times that of 1950. </seg>
> </tuv>
> <tuv lang="DE-DE">
> <seg>Der weltweite Konsum hat sich im Verlauf des 20. Jahrhundert in
> beispiellosem Tempo ausgeweitet. 1998 erreichen die privaten und
> öffentlichen Konsumausgaben 24 Billionen Dollar, sie sind damit doppelt so
> hoch wie 1975 und sechsmal so hoch wie 1950. </seg>
> </tuv>
> </tu>
> The aim is to remove all the English sentences, leaving the German ones in
> place. Load the document into Word, choose "Replace", then tick "use
> wildcards" . In the "Find what" box paste in:
> \<tuv lang="EN-GB"\>*\</tuv\>
> (Notice that the < and > characters need a backslash before them so that
> Word does not treat them as wildcards). If you choose "Replace all", this
> will now delete all the English sentences. Then use "save as" to save the
> file as German only. To create the English file, do the same thing to the
> original file but change the language code in the "Find what" box to
> "DE-DE". You can then use some similar techniques to remove the remaining
> XML codes and the creation dates. I realise that this is even more
> elaborate than your suggestion of using Excel, but it's something that
> students could perhaps manage. I agree entirely that it would be better if
> students didn't have to do this.
>
> Suggestion 2: Write to Mike Barlow and suggest that he adds to ParaConc the
> ability to handle files which are in this typical translation memory format
> where the source and target sentences are in pairs. Presumably this is a
> simpler task for a computer programme than relating texts in two separate
> files: as long as the computer knows which is the source language, then it
> would have to produce the sentence (or KWIC) containing the source word,
> along with the sentence which follows. For searches in the target language
> it would be the sentence that precedes. I couldn't wirte a programme to do
> this, but I think a programmer could. I hope that someone comes up with a
> better solution, and I'd be grateful if you could publicise anything useful.
>
> Best wishes. - Raphael
> -------------------------------------------------------------------------------------------------------------------
>
>
>
>
> ( : ============================================= : )
>
> Sampo Nevalainen, M.A.
> Researcher
> University of Joensuu
> Savonlinna School of Translation Studies
> P.O.Box 48
> FIN-57101 Savonlinna
> FINLAND
>
> tel     +358-15-511 70      (operator)
>          +358-15-511 7704
> fax     +358-15-515 096
> email   samponev at cc.joensuu.fi
> http://www.joensuu.fi/slnkvl/
>
>
>



More information about the Corpora mailing list