[Corpora-List] From: <ctribble at webline.pl>

owner-corpora at lists.uib.no owner-corpora at lists.uib.no
Mon Nov 3 09:25:09 UTC 2003


Thought I'd share a recent insight.  It may be something that others worked
out ages ago, but if not, it's particularly useful for Wordsmith Tools
users whowant to get more out of BNC.  The idea is to compile small specialist corpora
quickly and easily by selecting subsets of BNC.  The technique is:

1/	Put all the BNC text files into a single folder (in my case
H:\BNC\TEXTS)  The quick and dirty way to do this is to us Winzip to create a
compressed folder of all the BNC text files and then to unzip to a new
locationIGNORING original folder locations)

2/	Once you've got this, identify the sub-set of BNC texts you want to
use via Dave Lee's wonderful spreadsheet (http://clix.to/davidlee00) and
copy the column containing the file names.

3/	Paste this into Word and use Find & Replace to look for ^p (carriage
return) and replace with e.g. ^pH:\BNC\TEXTS\  You'll end up with a file list
such as:

H:\BNC\Texts\B33
H:\BNC\Texts\CNA
H:\BNC\Texts\EA0
H:\BNC\Texts\EA1
H:\BNC\Texts\EA2
H:\BNC\Texts\EC7
H:\BNC\Texts\EWX

(the beginning of a list of all the Academic texts in BNC)

4/	Save this as a plain text file (to avoid getting page numbers in your
list, best to copy and paste into a Notepad document)

5/	Using WST 3, load this file via Choose Texts, Favourites.

I know that there must be more elegant techniques that programmers can
use, but for rough and ready chaps like me, it's a REALLY useful way of
getting more out of BNC.

Best

Chris Tribble
--
		Dr Christopher Tribble
Mailing 	c/o FCO (Poland)
		The British Council Poland
		King Charles Street
		London, SW1A 2AH, UK
Poland		Idzikowskiego 19
		Warszawa, 02-704, Poland
		TEL  +48 (22) 853 1160
UK		122, Queen Alexandra Mansions, Judd Street
		London, WC1H 9DQ
		TEL +44 (020)7833 4271
E-mail		ctribble at clara.co.uk
Mobile		+48 604 442 812
Website	www.ctribble.co.uk



More information about the Corpora mailing list