[Corpora-List] Wordsmith concordance

Lou Burnard lou.burnard at computing-services.oxford.ac.uk
Thu Dec 19 09:59:00 UTC 2002


If you are indeed working on texts derived from the BNC, then a fairly
obvious thing to check would be whether the lines are in fact duplicated in
the BNC itself. Go to http://sara.natcorp.ox.ac.uk/lookup.html and type one
of your repeated phrases into the box.

There are (still) a few erroneous text duplications. More interestingly
there are several cases of genuine repetition-with-variants caused by
different newspapers (or the same newspaper at different times) re-using the
same agency material.

If you're not using the BNC of course this is irrelevant, except insofaras
it illustrates the general principle that one should *always* suspect the
data!

Lou

-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no]On
Behalf Of Anne Harrap
Sent: 17 December 2002 10:52
To: corpora list - messages to list
Subject: [Corpora-List] Wordsmith concordance


Does anyone else get a lot of duplicated entries when doing a
concordance in Wordsmith?

Not sure if this is a bug or we are doing something wrong...


Anne Harrap
Languages Centre Documentalist
School of Languages
Oxford Brookes University
Oxford (UK)

Tel:    +44 865 483723
Fax:    +44 865 483791
Email:  anneh at sol.brookes.ac.uk



More information about the Corpora mailing list