[Corpora-List] POS-tagging for spoken English and learner English

Jean Veronis Jean.Veronis at up.univ-mrs.fr
Mon Jul 25 10:02:48 UTC 2005


Adam Kilgarriff a écrit :

>      Do you have recent experiences of using available taggers on either of
>these kinds of data?
>
>	Reports including accuracy figures would be particularly useful.
>  
>

We have recently tagged a 300,000 word corpus of spoken French. Strategy 
and evaluation and reported here:

Campione, E., Véronis, J., & Deulofeu, J (2005). 3. The French corpus. 
In Cresti, E. & Moneglia, M. (Eds.), /C-ORAL-ROM, Integrated Reference 
Corpora for Spoken Romance Languages,/ (pp. 111-133). Amsterdam: John 
Benjamins. 

[Draft on-line: 
http://www.up.univ-mrs.fr/veronis/pdf/2005-Coralrom-book.pdf]

The good surprise is that we achieved results as good as those we get on 
written corpora (ca. 98% precision). This is probably due to the fact 
that, on one hand, spoken corpora are more difficult because of 
disfluencies (repetitions, repairs, etc.), but on the other hand, their 
lexicon is much smaller and sentence complexity much lower.

Best wishes

--j
  http://aixtal.blogspot.com

 



More information about the Corpora mailing list