<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns:o = "urn:schemas-microsoft-com:office:office"><HEAD>
<META http-equiv=Content-Type content="text/html; charset=UTF-8">
<META content="MSHTML 6.00.6001.18203" name=GENERATOR></HEAD>
<BODY id=role_body style="FONT-SIZE: 10pt; COLOR: #000000; FONT-FAMILY: Arial"
bottomMargin=7 leftMargin=7 topMargin=7 rightMargin=7><FONT id=role_document
face=Arial color=#000000 size=2>
<DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB; mso-bidi-font-style: italic"><FONT
face=Arial>Dear All<o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB; mso-bidi-font-style: italic"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB; mso-bidi-font-style: italic"><FONT
face=Arial>Here’s a summary of the responses to my query on word frequency lists
other than Kilgarriff’s at </FONT><A
title=http://www.kilgarriff.co.uk/bnc-readme.html
href="http://www.kilgarriff.co.uk/bnc-readme.html" target=_blank><FONT
face=Arial>http://www.kilgarriff.co.uk/bnc-readme.html</FONT></A><FONT
face=Arial> (derived from the BNC) and the ones discussed in</FONT></SPAN><SPAN
lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ans: EN-GB"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face=Arial><SPAN
lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB; mso-bidi-font-style: italic">Leech,
G., P. Rayson and A. Wilson. (2001). <EM>Word Frequencies in Written and Spoken
English: Based on the British National Corpus</EM>. London: Longman (derived
from the BNC)</SPAN><SPAN lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB"><o:p></o:p></SPAN></FONT></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-GB
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB"><FONT
face=Arial>and in<o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=IT
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: IT; mso-bidi-font-style: italic"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face=Arial><SPAN lang=IT
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10
..0pt; mso-ansi-language: IT; mso-bidi-font-style: italic">McCarthy, M. J.
(1998). </SPAN><EM><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">Spoken
Language and Applied Linguistics. </SPAN></EM><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">Cambridge:
Cambridge University Press (derived from the Cambridge International
Corpus).<o:p></o:p></SPAN></FONT></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoBodyText style="MARGIN: 0cm 0cm 0pt"><SPAN
lang=EN-US>Specifically, I was asking for (i) more word frequency lists
available either in print or online and (ii) references to research discussing
<EM>the</EM>, which tops most frequency lists derived from general corpora,
in terms of reference (anaphoric, cataphoric, etc.).</SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT
face=Arial>There was only one <STRONG>response </STRONG>to<STRONG> (ii)</STRONG>
by <STRONG>Steve Coffey</STRONG>, who has done research on the
indefinite articles <EM>a/an</EM>. Intriguingly, I found a really amazing
analysis of the use of <I>the</I> and the definite noun phrase (NP) that it goes
with in Biber et al. (1999: 263 ff.), where the authors not only outline the
different reference patterns of definite NPs (viz. anaphoric, indirect
anaphoric, cataphoric, situational, generic, and idiomatic) but calculate
the proportions these reference patterns obtain in four registers (viz.
Conversation, Fiction, News, and Academic
Writing).<o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT
face=Arial>There were a number of useful <STRONG>responses </STRONG>to<STRONG>
(i)</STRONG>:<o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><SPAN lang=EN-US
style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT
face=Arial> <o:p></o:p></FONT></SPAN></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face=Arial><FONT
face=Arial><B><SPAN lang=EN-US
style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">Paul
Rayson</SPAN></B><SPAN lang=EN-US
style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">
pointed to the </SPAN></FONT><SPAN lang=EN-US
style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 11.0pt; mso-ansi-language: EN-US"><FONT
face=Arial>companion website for the Leech et al book at:</FONT>
</SPAN></FONT></DIV>
<DIV class=MsoNormal style="MARGIN: 0cm 0cm 0pt"><FONT face=Arial><SPAN
lang=EN-US
style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 11.0pt; mso-ansi-language: EN-US"><A
title=http://ucrel.lancs.ac.uk/bncfreq/ href="http://ucrel.lancs.ac.uk/bncfreq/"
target=_blank><SPAN style="COLOR: windowtext"><FONT
face=Arial>http://ucrel.lancs.ac.uk/bncfreq/</FONT></SPAN></A> </SPAN><SPAN
lang=EN-US
style="FONT-SIZE: 12pt; FONT-FAMILY: 'Times New Roman'; mso-ansi-language: EN-US"><o:p></o:p></SPAN></FONT></DIV>
<DIV class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 11.0pt; mso-ansi-language: EN-US"><FONT
face=Arial>and references therein to other earlier frequency lists<SPAN
style="mso-spacerun: yes"> </SPAN>at:</FONT></SPAN></DIV>
<DIV class=MsoNormal
style="MARGIN: 0cm 0cm 0pt; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"><SPAN
lang=EN-US
style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 11.0pt; mso-ansi-language: EN-US"><FONT
face=Arial> </FONT><A
title=http://ucrel.lancs.ac.uk/bncfreq/samples/foreword.pdf
href="http://ucrel.lancs.ac.uk/bncfreq/samples/foreword.pdf" target=_blank><SPAN
style="COLOR: windowtext"><FONT
face=Arial>http://ucrel.lancs.ac.uk/bncfreq/samples/foreword.pdf</FONT></SPAN></A><o:p></o:p></SPAN></DIV><PRE style="BACKGROUND: white"><B><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 9.0pt; mso-ansi-language: EN-US"><FONT face=Arial size=2>John D. Burger </FONT></SPAN></B><FONT size=2><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 9.0pt; mso-ansi-language: EN-US"><FONT face=Arial>and<B> Stefan Evert </B>mentioned the Google language modeling data, based on over a </FONT></SPAN></FONT></PRE><PRE style="BACKGROUND: white"><FONT size=2><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 9.0pt; mso-ansi-language: EN-US"></SPAN></FONT><FONT size=2><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 9.0pt; mso-ansi-language: EN-US"><FONT face=Arial>trillion words worth of web pages at </FONT></SPAN><SPAN style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 9.0pt"><A title=http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html" target=_blank><SPAN lang=EN-US style="mso-ansi-language: EN-US"><FONT face=Arial>http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html</SPAN></A></SPAN><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 9.0pt; mso-ansi-language: EN-US"> </SPAN></FONT></FONT><SPAN lang=EN-US style="FONT-SIZE: 9pt; COLOR: black; FONT-FAMILY: Tahoma; mso-ansi-language: EN-US"><BR></PRE></SPAN><FONT
face=Arial><FONT size=2><TT><B><SPAN lang=EN-US
style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">Angus
B. Grieve-Smith</SPAN></B></TT><SPAN lang=EN-US
style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 9.0pt; mso-ansi-language: EN-US">
</SPAN><TT><SPAN lang=EN-US
style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><SPAN
style="mso-spacerun: yes"> </SPAN>contributed information on the frequency
list from the Brown Corpus of written American English (ca. 1962) which is
available from the Oxford Text Archive at <A
title=http://ota.ahds.ac.uk/headers/0668.xml
href="http://ota.ahds.ac.uk/headers/0668.xml"><SPAN
style="mso-bidi-font-family: 'Arial Unicode MS'">http://ota.ahds.ac.uk/headers/0668.xml</SPAN></A>
and also available in print:<o:p></o:p></SPAN></TT></FONT></FONT><PRE style="BACKGROUND: white"><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT face=Arial><FONT size=2>Frequency Analysis of English Usage: Lexicon and Grammar<o:p></o:p></FONT></FONT></SPAN></TT></PRE><PRE style="BACKGROUND: white"><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT face=Arial><FONT size=2>By Winthrop Nelson Francis, Henry Kucera, Andrew W. Mackie<o:p></o:p></FONT></FONT></SPAN></TT></PRE><PRE style="BACKGROUND: white"><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT face=Arial><FONT size=2>Contributor Henry Kucera, Andrew W. Mackie<o:p></o:p></FONT></FONT></SPAN></TT></PRE><PRE style="BACKGROUND: white"><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT face=Arial size=2>Published by Houghton Mifflin, 1982</FONT></SPAN></TT></PRE><PRE style="BACKGROUND: white"><F ONT size="2"><FONT face=Arial><FONT size=2><TT><B><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">Mark Davies </SPAN></B></TT><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">pointed to<B> </B>frequency lists for American English (based on COCA, a balanced </SPAN></TT><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">corpus of nearly 400 million words), TIME Magazine (100m words, 1920s-2000s), </SPAN></TT><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">Spanish (20m words, 1900s) and Portuguese (20m words, 1900s). Also available are </SPAN></TT><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US">n-grams for all of these languages (as well as for the BNC) at:<o:p></o:p></SPAN></TT></FONT></FONT></PRE><PRE style="BACKGROUND: white"><TT><SPAN lang=EN-US style="FONT-SIZE: 11pt; COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-lan: EN-US"><A title=http://corpus.byu.edu/word_frequency.asp href="http://corpus.byu.edu/word_frequency.asp"><SPAN style="mso-bidi-font-family: 'Arial Unicode MS'"><FONT face=Arial size=2>http://corpus.byu.edu/word_frequency.asp</FONT></SPAN></A><o:p></o:p></SPAN></TT></PRE><PRE style="BACKGROUND: white"><FONT face=Arial><FONT face=Arial><TT><B><SPAN lang=EN-US style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-ansi-language: EN-US; mso-ansi-font-size: 11.0pt"><FONT face=Arial>Adriano Ferraresi</FONT></SPAN></B></TT><TT><SPAN lang=EN-US style="COLOR: black; FONT-FAMILY: 'Times New Roman'; mso-ansi-language: EN-US; mso-ansi-font-size: 11.0pt"><FONT face=Arial> mentioned</FONT> </SPAN></TT></FONT><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><FONT face=Arial>several frequency lists (for English, but also Italian and German) at:</FONT> </SPAN></FONT><SPAN lang=EN-US style="FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-US"><A title=http://wacky.sslmit.unibo.it/ href="http://wacky.sslmit.unibo.it/" __removedLink__1196260709__href="http://wacky.sslmit.unibo.it/"><FONT face=Arial>http://wacky.sslmit.unibo.it</FONT></A><FONT face=Arial>, with the English lists extracted from ukWaC, a very large web-derived corpus containing around 2 billion words. See also:
</FONT></SPAN></PRE>
<DIV style="MARGIN-BOTTOM: 0pt"><SPAN lang=IT
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: IT"><BR><FONT
face=Arial size=2>Baroni, Bernardini, Ferraresi, Zanchetta (in print).
</FONT></SPAN><SPAN lang=EN-GB
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-languag: EN-GB"><FONT
face=Arial><FONT size=2>"The wacky wide web: a collection of very large
linguistically processed web-crawled corpora". <I>Language resources and
evaluation</I>.<BR><o:p></o:p></FONT></FONT></SPAN></DIV><SPAN lang=EN-GB
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB"></SPAN></DIV>
<DIV><SPAN lang=EN-GB
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB">Finally,
I should like to mention the Bank of English-derived word frequency list
in:</SPAN></DIV>
<DIV><SPAN lang=EN-GB
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB"></SPAN> </DIV>
<DIV><SPAN lang=EN-GB
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB">Sinclair,
J. McH. (1999). 'A way with common words.' In: H. Hasselgard and S. Oksefjell
(eds.) <EM>Out of Corpora: Studies in honour of Stig Johansson</EM>.
Amsterdam/Rodopi, pp. 157-179.</SPAN></DIV>
<DIV><SPAN lang=EN-GB
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB"><FONT
face=Arial size=2></FONT> </DIV>
<DIV style="MARGIN-BOTTOM: 0pt"><BR><FONT size=2><FONT
face="Arial, Helvetica, sans-serif">Many thanks for all
contributions<o:p></o:p></FONT></FONT></SPAN></DIV><SPAN lang=EN-GB
style="FONT-SIZE: 11pt; FONT-FAMILY: 'Times New Roman'; mso-bidi-font-size: 10.0pt; mso-ansi-language: EN-GB">
<DIV style="MARGIN-BOTTOM: 0pt"><BR><FONT size=2><FONT
face="Arial, Helvetica, sans-serif">Chris</FONT></FONT></DIV>
<DIV style="MARGIN-BOTTOM: 0pt"><FONT size=2><FONT
face="Arial, Helvetica, sans-serif">-------------------------------------------------</FONT></FONT></DIV>
<DIV style="MARGIN-BOTTOM: 0pt"><FONT size=2><FONT
face="Arial, Helvetica, sans-serif">Dr. Christoph Rühlemann</FONT></FONT></DIV>
<DIV style="MARGIN-BOTTOM: 0pt"><FONT face=Arial
size=2>Ludwig-Maximilians-University,
Munich</FONT></SPAN></DIV></FONT></BODY></HTML>