[Corpora-List] Data-Driven Learning materials

Alex Boulton Alex.Boulton at univ-nancy2.fr
Tue Apr 15 14:03:29 UTC 2008

Dear all

I recently requested information on any *published materials* or 
*on-line materials* adopting a data-driven learning approach. My thanks 
to the following for their replies:

    * Adam Turner
    * Chris Tribble
    * Mike Barlow
    * Brett Reynolds
    * Stéphanie O'Riordan
    * Antoinette Renouf
    * James Thomas
    * Linda Bawcom
    * Marcia Veirano Pinto
    * Przemek Kaszubski
    * Simon Smith
    * John Milton

Unfortunately (if unsurprisingly), there were no real additions to the 
publications I listed in the original mail. Is there really so little 
out there? Why? One respondent commented that his name had been 
suggested to two different publishers "to write a corpus book for 
teachers/students. Both of them said they liked the idea and A said the 
world is not ready for it and B said that they were already doing 
something on corpora at the time."


Below are the main responses; I have a number of other links to 
resources which I'll include on a site I hope to set up this summer 
(linked from my homepage at 


Some of the resources mentioned are already well-known and 
much-appreciated, including the following:

    * *Compleat Lexical Tutor* by Tom Cobb and Chris Greaves with its
      many resources http://www.lextutor.ca/ (free)
    * *MICASE* for academic spoken American English
      http://quod.lib.umich.edu/m/micase/ (free)
    * Mark Davies' 360m-word *BYU Corpus of American English* and
      interface to the *British National Corpus* among others
      http://davies-linguistics.byu.edu/personal/ (free)
    * Mike Scott's comprehensive *WordSmith Tools* for corpus analysis
      http://www.lexically.net/wordsmith/ (a demo version can be
      downloaded free)
    * *SketchEngine* by Adam Kilgarriff <http://www.kilgarriff.co.uk/>,
      Pavel Rychlý <http://www.fi.muni.cz/%7Epary/> & Jan Pomikálek
      corpus interface and word profiler http://www.sketchengine.co.uk/
      (30-day free trial)

The following projects are mentioned by people closely connected with 
them; comments are from their mails and/or from the sites themselves:

*Adam Turner: /Hanyang University Online Writing Lab (OWL)/* 
www.hanyangowl.org <http://www.hanyangowl.org/>
I have used the advanced search functions of Adobe PDF files with Korean 
graduate students writing for publication in English especially in the 
sciences. It is much more user-friendly than concordance software and 
can be used almost immediately in the classroom. When combined with a 
specialized small corpus it is very effective. I have received good 
feedback from faculty who have taken workshops from me using this approach.
Not being an engineer but an English for Specific Purposes instructor, I 
also used this approach to help me build up samples of text in the 
creation of materials for engineering research writing which informed an 
in-house guide to Engineering writing that I wrote.
The full-text of the book and the workshop handouts on how to use Google 
Scholar and Adobe Acrobat advanced search functions with students and 
faculty can be found here. Look under ESSENTIAL HANDOUTS on the right 

*Mike Barlow: /CorpusLAB/* http://www.corpuslab.com/
CorpusLAB is a new FREE site for language learners and language 
teachers. CorpusLAB is designed to make use of the results of corpus 
analysis to promote language learning based on real English used in 
different settings.
Students can use the site to take a variety of exercises created by 
teachers. Go to the Student pages and select a topic area (/Academic 
English, Business English, /etc.). If students register, they will be 
able to keep track of their progress.
Teachers can use the site in different ways. The central engine of the 
site is a series of exercise authoring tools. The exercises, 
fill-the-gap, multiple-choice, matching, reorder, and categorise, follow 
the traditional pattern, but they are designed in a way that promotes 
the learning of collocations and phrasal patterns. For example, the 
matching exercise allows up to five columns of items rather than the 
usual two, thereby providing practice in a range of collocations and 
Another feature of the site is the sharing of corpus resources and 
corpus-informed materials such as wordlists, handouts, ppts, etc. One of 
the aims of the site is to build up resources for specialised English: 
Medical English, English for Tourism, and so on. In addition, teachers 
have access to a corpus of spoken professional English via a simple 
concordancer. A utility for the analysis of potential teaching texts is 
also under development.

*Brett Reynolds: /Simple English Wiktionary/* http://simple.wiktionary.org/
The Simple English Wiktionary incorporates corpus data in selecting 
examples sentences and presents lemmas in frequency order as much as 
possible. This is a project that I'm heavily involved in and can speak 
more about. Because of its open aspect the amount of data used in 
writing it varies considerably between editors. [...]
Many examples are taken from the BNC, though they are sometimes edited. 
For instance, the noun 'intensity' turns up this text: "...but there are 
many others that suffer from high intensity of sunlight." That's been 
edited to "These flowers suffer from high intensity of sunlight."

*Antoinette Renouf: /WebCorp Linguist's Search Engine/*/ 
*...a resource which in its previous guise as WebCorp 
(http://www.webcorp.org.uk/wcadvanced.html) allowed thousands of 
learners and others to access the Web as a 'corpus', or at any rate a 
ready source of up to date language data, the output tailored by WebCorp 
tools for easy use.
This activity started from 2001 and continues, but the new version of 
WebCorp, /WebCorp Linguist's Search Engine/, has its own search engine, 
allowing us both to bypass Google and other non-linguistically-oriented 
search engines, and to create pre-processed subcorpora to suit 
individual users. The demo for the latest tool is at 
http://wse1.webcorp.org.uk/preview/ , but people will need to ask us for 
a password, as the site is still under development and we are working 
with identified users still.
The publications associated with both WebCorp systems can be found at: 
http://rdues.bcu.ac.uk/bibliog.shtml. The publications associated with 
both WebCorp systems can be found at: 
http://rdues.bcu.ac.uk/bibliog.shtml. The early papers (before 2000) 
refer to WebCorp, but after that, they refer partly or wholly also to 

*James Thomas: /A Ten-step Introduction to Concordancing through the 
Collins Cobuild Corpus Concordance Sampler/* 
This website is a quick rewrite of one of the same name that was created 
in 2002, and hosted on a public server. Since then, the Cobuild Sampler 
went completely off line for a long time. It is now back with some 
improvements that are not yet reflected in this Ten Step Intro. 
Concordancing for language study itself has undergone some evolution 
which will be reflected in the next version.
It is my intention to create a similar Introduction to Bonito, the 
concordancer created at the Faculty where I work. For access to Bonito 
and other web-based concordancers, click here 

*Przemek Kaszubski: /IFA Concordancer/* http://ifa.amu.edu.pl/~ifaconc 
I have been building a site mainly for our local academic EFL purposes 
but with a functional public demo-mode. Little content there as yet, but 
I very much hope the site will grow. The tool is called IFAConc and can 
be found here: http://ifa.amu.edu.pl/~ifaconc 
<http://ifa.amu.edu.pl/%7Eifaconc>. It is partly inspired by Tom Cobb's 
ideas as well as Tim Johns' kibbitzer pages, and some more.

*Simon Smith: *(re */SketchEngine/* http://www.sketchengine.co.uk/)
I'm involved in two projects in which users are presented with corpus 
data: one on Chinese, the other on English. Both of them make use of 
Adam Kilgarriff's Sketch Engine corpus query tool.
*In the **Chinese project**,* Alice Chen and I tried to assess, using 
pre- and post-tests, the progress in acquisition of collocational 
patterns made by a group of intermediate to advanced Chinese learners. 
These learners were exposed for a period of time to a large corpus of 
Chinese, accessed through the concordances and usage summaries offered 
by Sketch Engine. We prepared a walkthrough guide 
<http://mcu.edu.tw/%7Essmith/walkthrough/> to the use of corpora for 
language learning in general (and the Sketch Engine in particular), and 
described the work in a paper 
given at PALC, in Lodz, last year.
The results of that work were rather inconclusive, partly because 
our learners were left to their own devices as to how they went about 
exploring the corpus, and what they learned from it.
In July, I'll be building on that work with a much more task focused 
Chinese-learning experience. This will be aimed at beginners, and will 
take the form of a workshop at TALC 2008, Lisbon 
<http://talc8.isla.pt/workshops.html#mandarin>. Participants will learn 
about an important collocational category in the language, that of 
Verb-Object Compounds, which can be readily illustrated using corpus 
tools, and crops up often enough and early enough in every Chinese 
learner's exposure to the language to merit special study. If that 
sounds a bit dry, we'll also be practising some basic Mandarin, and even 
dabbling a little in the writing system. Not to mention learning about 
Sketch Engine along the way. If you're going to be at TALC, please 
consider joining us!
*The** English project* is on *corpus-generated cloze exercises.* Scott 
Sommers and I are presenting a paper 
<http://mcu.edu.tw/%7Essmith/ccu2008-smith.pdf> on this at the 2008 
Conference of English Teaching and Learning in R.O.C. 
A cloze exercise has three components: a cloze sentence ("The boy stood 
on the burning deck"), a key ("burning") and distractors ("lukewarm", 
"tepid","piping hot"..., for the sake of illustration). Our algorithm 
takes the key as input from the user, finds an appropriate sentence in 
the corpus, and supplies distractors (terms which have the same sort of 
distribution in the corpus as the key, but never actually occur with a 
particular collocate, such as "deck" in the example). [...]
Any feedback on either of these projects would of course be most welcome!

*John Milton:* */My Words/* http://mywords.ust.hk/STU/welcome.asp
You can download an MSWord toolbar called 'Check My Words' from http://mywords.ust.hk/. It takes a DDL approach to grammar-checking for learners of English, especially addressing common sentence-level errors of Chinese speakers, but useful for English learners of any L1. A companion program - 'Mark My Words' - can be used by teachers to insert comments containing relevant DDL links in students' documents. 

*Linda Bawcom* mentions this page: "Professor Daniel Kies' (College of 
Du Page in Illinois) /The Hyper Textbook/, a textbook he wrote for 
his composition course. One part is dedicated to  Conrad's /The Heart of 
Darkness /which he presents with strings from a concordancer  and then 
he  invites students to use a concordancer that he has set up. He also 
uses concordances in this hyper-textbook for examples in his 
explanations of grammar. By the way, this is not just your average 
run-of-the-mill 'grammar' book. It's worth browsing through if you teach 
composition or applied linguistics." 


*Marcia Veirano Pinto* sent me a selection of materials she had 
prepared, some in collaboration with Maria Cecília Lopes and Tony Berber 
Sardinha; though these are not available on the web she has given me 
permission to include them in the site I hope to create over the summer.

Thanks again to all

Alex Boulton a écrit :
> Dear all
> I'm trying to compile a list of published DDL materials for (L2) 
> language learning -- not materials which are simply corpus-informed 
> (from native-speaker or learner corpora), but where learners actually 
> come into contact with corpus data.
> I'm particularly interested in *books, CD-ROMs, DVD-ROMS or internet 
> sites* which are either wholly given over to DDL or which integrate 
> DDL activities in part -- anything which shows publishers have shown 
> an interest in DDL materials. (eg Tribble & Jones Concordances in the 
> Classroom; Barlow & Burdine Phrasal Verbs in Business / American 
> Phrasal Verbs; Thurston & Candlin Exploring Academic English; LingoNet 
> VideoCorpus; etc.)
> While I'm mainly concerned with published materials, I'd also be 
> interested in any links to other DDL resources which individuals or 
> groups may have produced but not published, especially on-line -- 
> again, not corpora, tools or interfaces on their own, but activities 
> explicitly based on corpora. (eg Tim Johns' Virtual DDL Library / 
> Kibbitzing One-to-Ones; Estling Vannestål & Lindquist's Corpora in 
> Grammar Teaching; ICT4LT; etc.)
> The above examples are inevitable English-oriented, but materials in 
> or about other languages would be more than welcome.
> I will of course post results to Corpora List, but I'd also like to 
> create a web page which lists them as a complement to Tim Johns' 
> data-driven learning page (last revised 06/02/97), and review as many 
> as possible. I'd be grateful also then for URLs and references to 
> published reviews and descriptions... or even free samples if you have 
> them!
> Thanks in advance
> alex
> -- 
> *                                                                  
> Alex Boulton*
>                                                                 Nancy 
> Université
> boulton at univ-nancy2.fr <mailto:boulton at univ-nancy2.fr>
> CRAPEL---ATILF/CNRS                                                                                               
> Tél :  
> Tél :
> Fax : 
> Fax :
> http://www.univ-nancy2.fr/CRAPEL/                                      
> http://www.univ-nancy2.fr/erudi
> *                     Tous les articles des /Mélanges CRAPEL/ sont 
> disponibles
>                                       gratuitement en ligne sous 
> format pdf :*
> http://revues.univ-nancy2.fr/melangesCrapel/


*                                                                  Alex 


boulton at univ-nancy2.fr <mailto:boulton at univ-nancy2.fr>



Tél :                                                                                
Tél :

Fax :                                                                              
Fax :



*                     Tous les articles des /Mélanges CRAPEL/ sont 
                                      gratuitement en ligne sous format 
pdf :*


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080415/8ccf5e80/attachment.htm>
-------------- next part --------------
Corpora mailing list
Corpora at uib.no

More information about the Corpora mailing list