[Corpora-List] From: <ctribble at webline.pl>
owner-corpora at lists.uib.no
owner-corpora at lists.uib.no
Mon Nov 3 09:25:09 UTC 2003
Thought I'd share a recent insight. It may be something that others worked
out ages ago, but if not, it's particularly useful for Wordsmith Tools
users whowant to get more out of BNC. The idea is to compile small specialist corpora
quickly and easily by selecting subsets of BNC. The technique is:
1/ Put all the BNC text files into a single folder (in my case
H:\BNC\TEXTS) The quick and dirty way to do this is to us Winzip to create a
compressed folder of all the BNC text files and then to unzip to a new
locationIGNORING original folder locations)
2/ Once you've got this, identify the sub-set of BNC texts you want to
use via Dave Lee's wonderful spreadsheet (http://clix.to/davidlee00) and
copy the column containing the file names.
3/ Paste this into Word and use Find & Replace to look for ^p (carriage
return) and replace with e.g. ^pH:\BNC\TEXTS\ You'll end up with a file list
such as:
H:\BNC\Texts\B33
H:\BNC\Texts\CNA
H:\BNC\Texts\EA0
H:\BNC\Texts\EA1
H:\BNC\Texts\EA2
H:\BNC\Texts\EC7
H:\BNC\Texts\EWX
(the beginning of a list of all the Academic texts in BNC)
4/ Save this as a plain text file (to avoid getting page numbers in your
list, best to copy and paste into a Notepad document)
5/ Using WST 3, load this file via Choose Texts, Favourites.
I know that there must be more elegant techniques that programmers can
use, but for rough and ready chaps like me, it's a REALLY useful way of
getting more out of BNC.
Best
Chris Tribble
--
Dr Christopher Tribble
Mailing c/o FCO (Poland)
The British Council Poland
King Charles Street
London, SW1A 2AH, UK
Poland Idzikowskiego 19
Warszawa, 02-704, Poland
TEL +48 (22) 853 1160
UK 122, Queen Alexandra Mansions, Judd Street
London, WC1H 9DQ
TEL +44 (020)7833 4271
E-mail ctribble at clara.co.uk
Mobile +48 604 442 812
Website www.ctribble.co.uk
More information about the Corpora
mailing list