[Corpora-List] Data-Driven Learning materials

Wed Apr 16 12:56:55 UTC 2008

Adam,
I wonder which method you are using for ranking examples.  We were
trying to do something similar, but for the whole webpages (and a
variety of languages).  For example, we ranked the English wikipedia and
my I-EN corpus by their coverage by GSL words,
http://corpus.leeds.ac.uk/teaching/i-en-gsl.csv.bz2
http://corpus.leeds.ac.uk/teaching/wiki-en-gsl.csv.bz2

The problem is that many pages with low lexical coverage by GSL contain
words that are known anyway, e.g., computer or construction.  On the
other hand, many phrasal verbs, e.g. 'give up' or constructions, 'go
extra mile', do contribute to the lexical count, but are not understood
by students.  Problems of this sort are not accidental (we found little
correlation between the GSL coverage and understanding), a much better
model of difficulty is needed to find texts/examples suitable for
language learners.
Serge

On Wed, 2008-04-16 at 12:53 +0100, Adam Kilgarriff wrote:
> Dear Alex,
>  
> you say
> >  Is there really so little out there? Why? 
> 
>  
> I think the reason is simple: Concordances are too tough for learners.
> So DDL has not taken off.  After 20 years, it remains a tiny minority
> interest.
> 
> Our response is to select corpus sentences according to readability.  
> The beta version of the Sketch Engine now has an option to sort
> concordances 
> "best first", from a learner's point of view, and we are working on
> other ways of 
> using corpora in language learning in which we only show 
> users sentences which they are likely to be able to read and
> understand.
>  
> Adam
>   
> 2008/4/15 Alex Boulton <Alex.Boulton at univ-nancy2.fr>:
>         Dear all
>         
>         
>         
>         I recently requested information on any published materials or
>         on-line materials
>         
>         
>         adopting a data-driven learning approach. My thanks to the
>         following for their replies:
>         
>               * Adam Turner 
>               * Chris Tribble 
>               * Mike Barlow
>               * Brett Reynolds
>               * Stéphanie O'Riordan
>               * Antoinette Renouf
>               * James Thomas
>               * Linda Bawcom
>               * Marcia Veirano Pinto
>               * Przemek Kaszubski
>               * Simon Smith
>               * John Milton
>         
>         Unfortunately (if unsurprisingly), there were no real
>         additions to the publications
>         
>         
>         I listed in the original mail. Is there really so little out
>         there? Why?
>         
>         
> ...
> 
> -- 
> ================================================
> Adam Kilgarriff http://www.kilgarriff.co.uk 
> Lexical Computing Ltd http://www.sketchengine.co.uk
> Lexicography MasterClass Ltd http://www.lexmasterclass.com
> Universities of Leeds and Sussex adam at lexmasterclass.com
> ================================================ 
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora