[Corpora-List] Data-Driven Learning materials
Alex Boulton
Alex.Boulton at univ-nancy2.fr
Tue Apr 15 14:03:29 UTC 2008
Dear all
I recently requested information on any *published materials* or
*on-line materials* adopting a data-driven learning approach. My thanks
to the following for their replies:
* Adam Turner
* Chris Tribble
* Mike Barlow
* Brett Reynolds
* Stéphanie O'Riordan
* Antoinette Renouf
* James Thomas
* Linda Bawcom
* Marcia Veirano Pinto
* Przemek Kaszubski
* Simon Smith
* John Milton
Unfortunately (if unsurprisingly), there were no real additions to the
publications I listed in the original mail. Is there really so little
out there? Why? One respondent commented that his name had been
suggested to two different publishers "to write a corpus book for
teachers/students. Both of them said they liked the idea and A said the
world is not ready for it and B said that they were already doing
something on corpora at the time."
Below are the main responses; I have a number of other links to
resources which I'll include on a site I hope to set up this summer
(linked from my homepage at
http://arche.univ-nancy2.fr/course/view.php?id=967).
Some of the resources mentioned are already well-known and
much-appreciated, including the following:
* *Compleat Lexical Tutor* by Tom Cobb and Chris Greaves with its
many resources http://www.lextutor.ca/ (free)
* *MICASE* for academic spoken American English
http://quod.lib.umich.edu/m/micase/ (free)
* Mark Davies' 360m-word *BYU Corpus of American English* and
interface to the *British National Corpus* among others
http://davies-linguistics.byu.edu/personal/ (free)
* Mike Scott's comprehensive *WordSmith Tools* for corpus analysis
http://www.lexically.net/wordsmith/ (a demo version can be
downloaded free)
* *SketchEngine* by Adam Kilgarriff <http://www.kilgarriff.co.uk/>,
Pavel Rychlý <http://www.fi.muni.cz/%7Epary/> & Jan Pomikálek
corpus interface and word profiler http://www.sketchengine.co.uk/
(30-day free trial)
The following projects are mentioned by people closely connected with
them; comments are from their mails and/or from the sites themselves:
*Adam Turner: /Hanyang University Online Writing Lab (OWL)/*
www.hanyangowl.org <http://www.hanyangowl.org/>
I have used the advanced search functions of Adobe PDF files with Korean
graduate students writing for publication in English especially in the
sciences. It is much more user-friendly than concordance software and
can be used almost immediately in the classroom. When combined with a
specialized small corpus it is very effective. I have received good
feedback from faculty who have taken workshops from me using this approach.
Not being an engineer but an English for Specific Purposes instructor, I
also used this approach to help me build up samples of text in the
creation of materials for engineering research writing which informed an
in-house guide to Engineering writing that I wrote.
The full-text of the book and the workshop handouts on how to use Google
Scholar and Adobe Acrobat advanced search functions with students and
faculty can be found here. Look under ESSENTIAL HANDOUTS on the right
sidebar.
*Mike Barlow: /CorpusLAB/* http://www.corpuslab.com/
CorpusLAB is a new FREE site for language learners and language
teachers. CorpusLAB is designed to make use of the results of corpus
analysis to promote language learning based on real English used in
different settings.
Students can use the site to take a variety of exercises created by
teachers. Go to the Student pages and select a topic area (/Academic
English, Business English, /etc.). If students register, they will be
able to keep track of their progress.
Teachers can use the site in different ways. The central engine of the
site is a series of exercise authoring tools. The exercises,
fill-the-gap, multiple-choice, matching, reorder, and categorise, follow
the traditional pattern, but they are designed in a way that promotes
the learning of collocations and phrasal patterns. For example, the
matching exercise allows up to five columns of items rather than the
usual two, thereby providing practice in a range of collocations and
phrases.
Another feature of the site is the sharing of corpus resources and
corpus-informed materials such as wordlists, handouts, ppts, etc. One of
the aims of the site is to build up resources for specialised English:
Medical English, English for Tourism, and so on. In addition, teachers
have access to a corpus of spoken professional English via a simple
concordancer. A utility for the analysis of potential teaching texts is
also under development.
*Brett Reynolds: /Simple English Wiktionary/* http://simple.wiktionary.org/
The Simple English Wiktionary incorporates corpus data in selecting
examples sentences and presents lemmas in frequency order as much as
possible. This is a project that I'm heavily involved in and can speak
more about. Because of its open aspect the amount of data used in
writing it varies considerably between editors. [...]
Many examples are taken from the BNC, though they are sometimes edited.
For instance, the noun 'intensity' turns up this text: "...but there are
many others that suffer from high intensity of sunlight." That's been
edited to "These flowers suffer from high intensity of sunlight."
*Antoinette Renouf: /WebCorp Linguist's Search Engine/*/
/http://wse1.webcorp.org.uk/preview/*
*...a resource which in its previous guise as WebCorp
(http://www.webcorp.org.uk/wcadvanced.html) allowed thousands of
learners and others to access the Web as a 'corpus', or at any rate a
ready source of up to date language data, the output tailored by WebCorp
tools for easy use.
This activity started from 2001 and continues, but the new version of
WebCorp, /WebCorp Linguist's Search Engine/, has its own search engine,
allowing us both to bypass Google and other non-linguistically-oriented
search engines, and to create pre-processed subcorpora to suit
individual users. The demo for the latest tool is at
http://wse1.webcorp.org.uk/preview/ , but people will need to ask us for
a password, as the site is still under development and we are working
with identified users still.
The publications associated with both WebCorp systems can be found at:
http://rdues.bcu.ac.uk/bibliog.shtml. The publications associated with
both WebCorp systems can be found at:
http://rdues.bcu.ac.uk/bibliog.shtml. The early papers (before 2000)
refer to WebCorp, but after that, they refer partly or wholly also to
WebCorpLSE.
*James Thomas: /A Ten-step Introduction to Concordancing through the
Collins Cobuild Corpus Concordance Sampler/*
http://www.fi.muni.cz/%7Ethomas/CCS/
This website is a quick rewrite of one of the same name that was created
in 2002, and hosted on a public server. Since then, the Cobuild Sampler
went completely off line for a long time. It is now back with some
improvements that are not yet reflected in this Ten Step Intro.
Concordancing for language study itself has undergone some evolution
which will be reflected in the next version.
It is my intention to create a similar Introduction to Bonito, the
concordancer created at the Faculty where I work. For access to Bonito
and other web-based concordancers, click here
<http://www.fi.muni.cz/%7Ethomas/EAP/concordancers.htm>.
(http://www.fi.muni.cz/~thomas/EAP/concordancers.htm
<http://www.fi.muni.cz/%7Ethomas/EAP/concordancers.htm>)
*Przemek Kaszubski: /IFA Concordancer/* http://ifa.amu.edu.pl/~ifaconc
<http://ifa.amu.edu.pl/%7Eifaconc>
I have been building a site mainly for our local academic EFL purposes
but with a functional public demo-mode. Little content there as yet, but
I very much hope the site will grow. The tool is called IFAConc and can
be found here: http://ifa.amu.edu.pl/~ifaconc
<http://ifa.amu.edu.pl/%7Eifaconc>. It is partly inspired by Tom Cobb's
ideas as well as Tim Johns' kibbitzer pages, and some more.
*Simon Smith: *(re */SketchEngine/* http://www.sketchengine.co.uk/)
I'm involved in two projects in which users are presented with corpus
data: one on Chinese, the other on English. Both of them make use of
Adam Kilgarriff's Sketch Engine corpus query tool.
*In the **Chinese project**,* Alice Chen and I tried to assess, using
pre- and post-tests, the progress in acquisition of collocational
patterns made by a group of intermediate to advanced Chinese learners.
These learners were exposed for a period of time to a large corpus of
Chinese, accessed through the concordances and usage summaries offered
by Sketch Engine. We prepared a walkthrough guide
<http://mcu.edu.tw/%7Essmith/walkthrough/> to the use of corpora for
language learning in general (and the Sketch Engine in particular), and
described the work in a paper
<http://www.kilgarriff.co.uk/Publications/2007-SmithChenKilg-PALC.doc>
given at PALC, in Lodz, last year.
The results of that work were rather inconclusive, partly because
our learners were left to their own devices as to how they went about
exploring the corpus, and what they learned from it.
In July, I'll be building on that work with a much more task focused
Chinese-learning experience. This will be aimed at beginners, and will
take the form of a workshop at TALC 2008, Lisbon
<http://talc8.isla.pt/workshops.html#mandarin>. Participants will learn
about an important collocational category in the language, that of
Verb-Object Compounds, which can be readily illustrated using corpus
tools, and crops up often enough and early enough in every Chinese
learner's exposure to the language to merit special study. If that
sounds a bit dry, we'll also be practising some basic Mandarin, and even
dabbling a little in the writing system. Not to mention learning about
Sketch Engine along the way. If you're going to be at TALC, please
consider joining us!
*The** English project* is on *corpus-generated cloze exercises.* Scott
Sommers and I are presenting a paper
<http://mcu.edu.tw/%7Essmith/ccu2008-smith.pdf> on this at the 2008
Conference of English Teaching and Learning in R.O.C.
<http://www.ccu.edu.tw/fllcccu/2008EIA/English/Eprogram.php>
A cloze exercise has three components: a cloze sentence ("The boy stood
on the burning deck"), a key ("burning") and distractors ("lukewarm",
"tepid","piping hot"..., for the sake of illustration). Our algorithm
takes the key as input from the user, finds an appropriate sentence in
the corpus, and supplies distractors (terms which have the same sort of
distribution in the corpus as the key, but never actually occur with a
particular collocate, such as "deck" in the example). [...]
Any feedback on either of these projects would of course be most welcome!
*John Milton:* */My Words/* http://mywords.ust.hk/STU/welcome.asp
You can download an MSWord toolbar called 'Check My Words' from http://mywords.ust.hk/. It takes a DDL approach to grammar-checking for learners of English, especially addressing common sentence-level errors of Chinese speakers, but useful for English learners of any L1. A companion program - 'Mark My Words' - can be used by teachers to insert comments containing relevant DDL links in students' documents.
*Linda Bawcom* mentions this page: "Professor Daniel Kies' (College of
Du Page in Illinois) /The Hyper Textbook/, a textbook he wrote for
his composition course. One part is dedicated to Conrad's /The Heart of
Darkness /which he presents with strings from a concordancer and then
he invites students to use a concordancer that he has set up. He also
uses concordances in this hyper-textbook for examples in his
explanations of grammar. By the way, this is not just your average
run-of-the-mill 'grammar' book. It's worth browsing through if you teach
composition or applied linguistics."
http://papyr.com/hypertextbooks/grammar/conrad_heart_of_darkness.htm
*Marcia Veirano Pinto* sent me a selection of materials she had
prepared, some in collaboration with Maria Cecília Lopes and Tony Berber
Sardinha; though these are not available on the web she has given me
permission to include them in the site I hope to create over the summer.
Thanks again to all
alex
Alex Boulton a écrit :
> Dear all
>
> I'm trying to compile a list of published DDL materials for (L2)
> language learning -- not materials which are simply corpus-informed
> (from native-speaker or learner corpora), but where learners actually
> come into contact with corpus data.
>
> I'm particularly interested in *books, CD-ROMs, DVD-ROMS or internet
> sites* which are either wholly given over to DDL or which integrate
> DDL activities in part -- anything which shows publishers have shown
> an interest in DDL materials. (eg Tribble & Jones Concordances in the
> Classroom; Barlow & Burdine Phrasal Verbs in Business / American
> Phrasal Verbs; Thurston & Candlin Exploring Academic English; LingoNet
> VideoCorpus; etc.)
>
> While I'm mainly concerned with published materials, I'd also be
> interested in any links to other DDL resources which individuals or
> groups may have produced but not published, especially on-line --
> again, not corpora, tools or interfaces on their own, but activities
> explicitly based on corpora. (eg Tim Johns' Virtual DDL Library /
> Kibbitzing One-to-Ones; Estling Vannestål & Lindquist's Corpora in
> Grammar Teaching; ICT4LT; etc.)
>
> The above examples are inevitable English-oriented, but materials in
> or about other languages would be more than welcome.
>
> I will of course post results to Corpora List, but I'd also like to
> create a web page which lists them as a complement to Tim Johns'
> data-driven learning page (last revised 06/02/97), and review as many
> as possible. I'd be grateful also then for URLs and references to
> published reviews and descriptions... or even free samples if you have
> them!
>
> Thanks in advance
> alex
> --
>
> *
> Alex Boulton*
>
> Nancy
> Université
>
>
> boulton at univ-nancy2.fr <mailto:boulton at univ-nancy2.fr>
>
>
>
> CRAPEL---ATILF/CNRS
> ERUDI
>
> Tél :
> 03.83.96.71.30
> Tél : 03.83.96.84.44
>
> Fax :
> 03.83.96.71.32
> Fax : 03.83.96.84.49
>
> http://www.univ-nancy2.fr/CRAPEL/
> http://www.univ-nancy2.fr/erudi
>
>
>
> * Tous les articles des /Mélanges CRAPEL/ sont
> disponibles
> gratuitement en ligne sous
> format pdf :*
>
>
> http://revues.univ-nancy2.fr/melangesCrapel/
>
--
* Alex
Boulton*
Nancy
Université
boulton at univ-nancy2.fr <mailto:boulton at univ-nancy2.fr>
CRAPEL---ATILF/CNRS
ERUDI
Tél :
03.83.96.71.30
Tél : 03.83.96.84.44
Fax :
03.83.96.71.32
Fax : 03.83.96.84.49
http://www.univ-nancy2.fr/CRAPEL/
http://www.univ-nancy2.fr/erudi
* Tous les articles des /Mélanges CRAPEL/ sont
disponibles
gratuitement en ligne sous format
pdf :*
http://revues.univ-nancy2.fr/melangesCrapel/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20080415/8ccf5e80/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list