<HTML dir=ltr><HEAD>
<META content="text/html; charset=unicode" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 8.00.6001.18876"></HEAD>
<BODY>
<DIV dir=ltr id=idOWAReplyText35081>
<DIV dir=ltr><FONT color=#000000 size=2 face=Arial>Without stepping directly into this controversy I would like to argue for a Wittgensteinian position: whether or not a given object qualifies as a corpus depends not on its material constitution but on the uses to which it is put - and the rest follows from that. Thus a person traveling from town to town giving recitals of (excerpts from) the complete works of Jane Austin would most likely be not a corpus linguist but an actor. Should someone wish to offer a generalization on (say) "Anomalies in Jane Austin's use of emotive adjectives in postmodifying position" based on Austin's complete works, that person would most likely be a corpus linguist.</FONT></DIV>
<DIV dir=ltr><FONT size=2 face=Arial></FONT> </DIV>
<DIV dir=ltr><FONT size=2 face=Arial>Regards,</FONT></DIV>
<DIV dir=ltr><FONT size=2 face=Arial>Magnar Brekke</FONT></DIV></DIV>
<DIV dir=ltr><BR>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>Fra:</B> corpora-bounces@uib.no pĺ vegne av Adam Kilgarriff<BR><B>Sendt:</B> on 03.03.2010 11:25<BR><B>Til:</B> Geoffrey Williams<BR><B>Kopi:</B> Xin Yan; corpora@uib.no<BR><B>Emne:</B> Re: [Corpora-List] Free text corpora?<BR></FONT><BR></DIV>
<DIV>Sinclair was wrong.
<DIV><BR></DIV>
<DIV>For the argument in more detail see the opening section of <SPAN style="FONT-SIZE: medium" class=Apple-style-span>Kilgarriff and Grefenstette 2003 <A href="http://kilgarriff.co.uk/Publications/2003-KilgGrefenstette-WACIntro.pdf">Introduction to the Special Issue on Web as Corpus.</A> <EM>Computational Linguistics</EM> 29 (3)</SPAN></DIV>
<DIV>(which I wrote before reading Sinclair's revised version: since reading that, I use Sinclair as my protaganist (much more suitable than McEnery and Wilson, whom I quote but who, unlike Sinclair, don't really say anything I disagree with). </DIV>
<DIV><BR></DIV>
<DIV>Adam</DIV>
<DIV><BR><BR>
<DIV class=gmail_quote>On 3 March 2010 06:45, Geoffrey Williams <SPAN dir=ltr><<A href="mailto:geoffrey.williams@univ-ubs.fr">geoffrey.williams@univ-ubs.fr</A>></SPAN> wrote:<BR>
<BLOCKQUOTE style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class=gmail_quote>Dear Adam,<BR><BR>Watering down of a discipline in order to take on board all comers is not<BR>a good thing. Far from strangling corpus linguistics Martin is reaffirming<BR>its very basis. Sinclair et al's OSTI report laid down the basics. These<BR>have been improved over the years, gaining greater clarity, until the 1996<BR>EAGLES definition was published. This was a benchmark until Sinclair's<BR>2005 revised version, discussed in detail in the book edited by Martin.<BR><BR>Dumps are about rubbish, either legal or fly dumping. The results is still<BR>rubbish in rubbish out. Corpus linguistics has to have a benchmark,<BR>otherwisde it would cease to exist.<BR><BR>However corpus linguistics does not have a monopoly on the word corpus<BR>which is somewhat polysemic. If a Wikipedia or other dump works for<BR>testing some commercial application or NLP project, then why not use it.<BR>On the other hand, don't say that corpus linguistics is being done.<BR><BR>Corpora list is wealthy through the experience of many who use corpora in<BR>different ways. Not all are corpus linguists, neither should they be. It<BR>is esential however that those disciplines involved know their research<BR>protocols. This is what Martin's timely reminder is about.<BR><BR>Perfection is difficult to achieve, but that does not make it a less<BR>worthwhile goal.<BR><BR>Best<BR><BR>Geoffrey<BR>
<DIV>
<DIV></DIV>
<DIV class=h5><BR><BR>>> ...and a "dump" such as this couldn't be further from qualifying as a<BR>> corpus, if defined as "a > collection of pieces of language, selected and<BR>> ordered according to explicit linguistic criteria > in order to be used as<BR>> a<BR>> sample of the language.”<BR>><BR>> Sorry, Martin, but your definition of 'corpus' reads like it's designed to<BR>> strangle the life out of corpus linguistics. It begs questions about<BR>> selection and ordering (?? how does ordering come into it?) and explicit<BR>> linguistic criteria, and demotes many things that people refer to as<BR>> 'corpora' to a lower form of life. Lexicographically, bad.<BR>><BR>> I think it's a dream of some corpus linguists as to what they think a<BR>> corpus<BR>> should be, not a fact about how the word is used. But, delete the middle<BR>> clause and we're in agreement:<BR>> "a collection of pieces of language, used as a sample of the language"<BR>><BR>> Adam<BR>><BR>> On 2 March 2010 22:55, Martin Wynne <<A href="mailto:martin.wynne@oucs.ox.ac.uk">martin.wynne@oucs.ox.ac.uk</A>> wrote:<BR>><BR>>> Francis Tyers wrote:<BR>>><BR>>>> El dt 02 de 03 de 2010 a les 12:38 +0100, en/na Xin Yan va escriure:<BR>>>><BR>>>><BR>>>><BR>>>>> Hello,<BR>>>>><BR>>>>> can anyone tell me, if there are some free text corpora for commercial<BR>>>>> purpose?<BR>>>>> Thank you in advance!<BR>>>>><BR>>>>><BR>>>><BR>>>> You can download dumps of Wikipedia from <A href="http://download.wikimedia.org/" target=_blank>http://download.wikimedia.org</A><BR>>>> -- they are licensed under the CC-BY-SA or GFDL -- both of which allow<BR>>>> commercial use, providing changes made are redistributed under the same<BR>>>> licence.<BR>>>><BR>>>> Best regards,<BR>>>><BR>>>> Fran<BR>>>><BR>>>><BR>>>><BR>>>> _______________________________________________<BR>>>> Corpora mailing list<BR>>>> <A href="mailto:Corpora@uib.no">Corpora@uib.no</A><BR>>>> <A href="http://mailman.uib.no/listinfo/corpora" target=_blank>http://mailman.uib.no/listinfo/corpora</A><BR>>>><BR>>>><BR>>><BR>>> Dumps of wikipedia may be an interesting electronic text collection that<BR>>> can be used to help address various linguistic research questions, but I<BR>>> think that the request was for a corpus...and a "dump" such as this<BR>>> couldn't<BR>>> be further from qualifying as a corpus, if defined as "a collection of<BR>>> pieces of language, selected and ordered according to explicit<BR>>> linguistic<BR>>> criteria in order to be used as a sample of the language.”<BR>>><BR>>> The good news is that corpora are available. If you let us know what<BR>>> sort<BR>>> of corpus you are looking for and for what sort of commercial uses you<BR>>> intend to put them to, I am sure that there are plenty of people here on<BR>>> the<BR>>> mailing list who can help you.<BR>>><BR>>> Martin<BR>>> Oxford Text Archive<BR>>><BR>>><BR>>> _______________________________________________<BR>>> Corpora mailing list<BR>>> <A href="mailto:Corpora@uib.no">Corpora@uib.no</A><BR>>> <A href="http://mailman.uib.no/listinfo/corpora" target=_blank>http://mailman.uib.no/listinfo/corpora</A><BR>>><BR>><BR>><BR>><BR>> --<BR>> ================================================<BR>> Adam Kilgarriff<BR>> <A href="http://www.kilgarriff.co.uk/" target=_blank>http://www.kilgarriff.co.uk</A><BR>> Lexical Computing Ltd <A href="http://www.sketchengine.co.uk/" target=_blank>http://www.sketchengine.co.uk</A><BR>> Lexicography MasterClass Ltd <A href="http://www.lexmasterclass.com/" target=_blank>http://www.lexmasterclass.com</A><BR>> Universities of Leeds and Sussex <A href="mailto:adam@lexmasterclass.com">adam@lexmasterclass.com</A><BR>> ================================================<BR>> _______________________________________________<BR>> Corpora mailing list<BR>> <A href="mailto:Corpora@uib.no">Corpora@uib.no</A><BR>> <A href="http://mailman.uib.no/listinfo/corpora" target=_blank>http://mailman.uib.no/listinfo/corpora</A><BR>><BR><BR><BR></DIV></DIV>--<BR>Prof. Geoffrey Williams,<BR>Vice Président des Relations Internationales<BR>Professeur des universités en sciences du langage<BR>Directeur du département d'ingénierie du document<BR>UFR de Lettres, Sciences Humaines et Sociales<BR>Université de Bretagne-Sud<BR>4 rue Jean Zay,<BR>BP92116<BR>F-56321 LORIENT Cedex<BR><BR>tél: +33 (0) 2 97 87 29 20<BR>fax: +33 (0) 2 97 87 65 25<BR><BR></BLOCKQUOTE></DIV><BR><BR clear=all><BR>-- <BR>================================================<BR>Adam Kilgarriff <A href="http://www.kilgarriff.co.uk/">http://www.kilgarriff.co.uk</A> <BR>Lexical Computing Ltd <A href="http://www.sketchengine.co.uk/">http://www.sketchengine.co.uk</A><BR>Lexicography MasterClass Ltd <A href="http://www.lexmasterclass.com/">http://www.lexmasterclass.com</A><BR>Universities of Leeds and Sussex <A href="mailto:adam@lexmasterclass.com">adam@lexmasterclass.com</A><BR>================================================<BR></DIV></DIV></BODY></HTML>