Corpora: Wordsmith question

Darren Pearce darrenp at cogs.susx.ac.uk
Wed Apr 3 14:47:08 UTC 2002


On Wed, 3 Apr 2002, PD Dr. Edward Wornar wrote:

> From: Van den Heuvel M Mev <MVDH at sun.ac.za>
> Subject: Corpora: Wordsmith question
> Date: Wed, 3 Apr 2002 11:13:22 +0200
>
> > Hi everybody,
> >
> > I'm having a spot of trouble with the Wordlist tool in the Wordsmith suite
> > that I hope someone out there can help me with. I want to compare two almost
> > identical word lists containing the entries of a pronunciation lexicon.
> > There are some inconsistencies between the lists, i.e. items missing in the
> > one that should be in the other and vice versa. I need to identify the
> > missing words. I thought that I could use the "compare word lists" function
> > in Wordlist for this purpose by setting the minimum frequency to 1 word, but
> > it's not working. I'm obviously doing something wrong.
> >
> > If you don't have a quick answer to the Wordsmith problem, but know of
> > another tool that could help me do just this one little task with a few
> > button clicks, I would also appreciate your response!
>
> If the format of the wordlists is just plain text with one word on each line,
> a simple diff should do the trick. What system are you using? If it's a UNIX-like
> system, you'll have diff, otherwise you might want to get the cygwin tools. At
> the shell prompt, sh like
>
> diff wordlist1 wordlist2 > differences
>
> will write the differences into a file 'differences'. If you want a user interface
> so as to take over parts from one file into the other or see the files side by side
> with the differences marked, try emacs (or XEmacs) which comes with the useful tool
> ediff.
>
> Cheers
>
> Edi

Once again assuming that your files are just plain text then you could
also use the unix 'comm' command. This allows you to look at those lines
that are unique to the first file, unique to the second and common to
both. Any of these lists can be suppressed.

Good luck.

Darren.

+-------------------------------------------------------------------------+
|                                                                         |
| Darren Pearce 	                                                  |
| COGS, Sussex University, Falmer, Brighton                               |
| Mobile: 07950 255 448                                                   |
| Email:  darrenmpearce at bigfoot.com                                       |
| Web:    http://www.cogs.susx.ac.uk/users/darrenp                        |
|                                                                         |
+-------------------------------------------------------------------------+



More information about the Corpora mailing list