[Corpora-List] Lexical bundles - and meaningful items...

Chris Butler csblists at telefonica.net
Fri Jul 8 07:36:16 UTC 2005


Dear John and other list members,

Ute Römer said:

"But I suppose that concordances of frequent
3-grams may still lead you to some interesting (and meaningful) 4- and
5-word items."

For lists of 3-word strings as well as longer ones, derived from English
corpora, you might like to look at the following, if you haven't already
done so:

Stubbs, Michael and Isabel Barth (2003) 'Using recurrent phrases as text
type discriminators: a quantitative method and some findings." Functions of
Language 10(1): 61-104.

For similar data from Spanish, derived from smaller corpora (some as small
as 125000 words, none bigger than 1 million words), see

Butler, Christopher S. (1997) "Repeated word combinations in spoken and
written text: some implications for Functional Grammar." In C. S: Butler, J.
H. Connolly, R. A. Gatward and R. M. Vismans (eds.) A Fund of Ideas: Recent
Developments in Functional Grammar. Amsterdam: Institute for Functional
Research into Language and Language Use (IFOTT).

[As this is in a rather obscure publication which may be difficult for
people to get hold of, I could send an electronic version to anyone who is
interested.]

Also, Bengt Altenberg says in the following paper that most of the recurrent
sequences he isolated from the London-Lund Corpus were pretty short, with an
average of 3.15 words, and he gives a lot of examples of phraseologically
interesting 3-word sequences:

Altenberg, Bengt (1998) On the phraseology of Spoken English: the evidence
of recurrent word combinations." In A. P. Cowie (ed.) Phraseology: Theory,
Analysis, and Applications". Oxford: Clarendon Press.

Chris Butler

********************************************

Ute Römer
English Department
University of Hanover
Königsworther Platz 1
30167 Hannover
Germany

Phone: +49 (0)511 762 2997
Fax: +49 (0)511 762 2996
E-mail: ute.roemer at anglistik.uni-hannover.de
http://www.uteroemer.de
http://www.fbls.uni-hannover.de/angli/


> -----Original Message-----
> From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
> Behalf Of Jenny Eagleton
> Sent: Monday, July 04, 2005 4:46 AM
> To: corpora at uib.no
> Subject: [Corpora-List] Lexical bundles
>
> ON BEHALF OF PROF. JOHN FLOWERDEW
>
> DEPARTMENT OF ENGLISH AND COMMUNICATION
>
> CITY UNIVERSITY OF HONG KONG
> RE: LEXICAL BUNDLES.
>
>  I notice that all of the studies I have read on
> this topic have
> focussed on 4 word bundles and that you they have
> all used what I
> would call large corpora i.e. many millions of
> words. The rationale
> seems to be that with 5 word bundles you do not
> get enough to analyse
> and that with three word bundles there are
> probably too many to
> handle.
>
> I want to do a study of bundles on a specific
> corpus I have, but
> which only has 600,000 words. To be able to work
> with large numbers
> of bundles, it would therefore make sense to focus
> on 3 word bundles.
> I could do a study on 4 word bundles, but the
> sample would be smaller.
>
>
> So my question is, do people see any disadvantages
> on focusing on
> 3-word bundles and, if so, what they might be?
>
> Looking forward to hearing your responses.
>
>
>



More information about the Corpora mailing list