[Corpora-List] Bank of English as a monitor corpus

Gill Philip g.philip.polidoro at gmail.com
Thu Sep 24 12:20:10 UTC 2009


I was working on my PhD when the BofE underwent a massive change (from 350m
to 429 m), and still have data files from both versions. The interesting
thing was that it wasn't a mere expansion, but some text disappeared while
new ones were added. I didn't really do anything about it at the time - just
took advantage of the fact that there were more concordance lines to use (in
phraseology, there is always a paucity of data so it's nice to get more).
But some frequencies got skewed massively - I was working on colour words,
and have the data still (in an appendix to my PhD if anyone can't live
without knowing ...
http://amsacta.cib.unibo.it/archive/00002266/01/Thesis.pdf ) which shows how
the frequency of colour lemmas changed.
But I suppose that monitoring langauge presumably also means re-assessing
which text types and sources should be included and in which proportions. I
would hope there's a lot of email/blog-type language in there now, but,
alas, I no longer have access to the beast (too expensive), so am not at
liberty to know. What I do know is that, having recently re-analysed a
number of phrases using BNC data, and compared them to the analyses done on
BofE data, I feel utterly confused as the results I've got
are more different than they should be (I too find BNC more homogeneous than
BofE).
just a little food for thought
Gill


On 23/09/2009, Mark Davies <Mark_Davies at byu.edu> wrote:
>
> Brett T. wrote:
>
> >> Without wishing to trojan horse this discussion, you may like to know
> that the recently revamped version of WordbanksOnline, that of course
> evolved from the boe, are available at present on a free trial basis:
> www.collinslanguage.com/wordbanks/Default.aspx
>
> I think it address the question perfectly. Thanks for your input, Brett.
>
> Bill L. wrote:
>
> >> I hope that the word bank is a step forward.
> >> The word WORD indicates that it may not be. Frege tried to wean us off
> single words more than 100 years ago.
> >> Only COLLOCATION makes them FACTS
> >> I am sure that if your registered users feel sold short by the new
> software they will let the list know.
>
> I've been using the Bank of English via Word Banks Online for the last
> couple of weeks. Although I do have questions about the corpus design (for
> diachronic work), what I really do like about the BoE in its most recent
> incarnation as WBO is the interface. It is based on the Sketch Engine
> architecture and interface, which does a wonderful job with collocates and
> collocations. I would strongly recommend to anyone here who is mainly
> interested in collocates and collocations that they take a look at BoE/WBO
> online -- especially if they aren't already familiar with Sketch Engine and
> what it has to offer (and no, I'm not on their payroll :-).
>
> And not unpredictably (here it comes, Adam) I might mention that you might
> also want to take a look at the COCA corpus (www.americancorpus.org) --
> very similar to Sketch Engine in terms of queries (collocates, etc), large
> genre-balanced corpus, and freely available online.
>
> Mark D.
>
> ============================================
> Mark Davies
> Professor of (Corpus) Linguistics
> Brigham Young University
> (phone) 801-422-9168 / (fax) 801-422-0906
> Web: http://davies-linguistics.byu.edu
>
> ** Corpus design and use // Linguistic databases **
> ** Historical linguistics // Language variation **
> ** English, Spanish, and Portuguese **
> ============================================
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>



-- 
*********************************
Dr. Gill Philip
CILTA
Università degli Studi di Bologna
Piazza San Giovanni in Monte, 4
40124 Bologna
Italy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090924/af1997cb/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list