Corpora: Wordsmith question

PD Dr. Edward Wornar edi at serbski-institut.de
Wed Apr 3 11:49:11 UTC 2002


From: Van den Heuvel M Mev <MVDH at sun.ac.za>
Subject: Corpora: Wordsmith question
Date: Wed, 3 Apr 2002 11:13:22 +0200

> Hi everybody,
>
> I'm having a spot of trouble with the Wordlist tool in the Wordsmith suite
> that I hope someone out there can help me with. I want to compare two almost
> identical word lists containing the entries of a pronunciation lexicon.
> There are some inconsistencies between the lists, i.e. items missing in the
> one that should be in the other and vice versa. I need to identify the
> missing words. I thought that I could use the "compare word lists" function
> in Wordlist for this purpose by setting the minimum frequency to 1 word, but
> it's not working. I'm obviously doing something wrong.
>
> If you don't have a quick answer to the Wordsmith problem, but know of
> another tool that could help me do just this one little task with a few
> button clicks, I would also appreciate your response!

If the format of the wordlists is just plain text with one word on each line,
a simple diff should do the trick. What system are you using? If it's a UNIX-like
system, you'll have diff, otherwise you might want to get the cygwin tools. At
the shell prompt, sh like

diff wordlist1 wordlist2 > differences

will write the differences into a file 'differences'. If you want a user interface
so as to take over parts from one file into the other or see the files side by side
with the differences marked, try emacs (or XEmacs) which comes with the useful tool
ediff.

Cheers

Edi



More information about the Corpora mailing list