[Corpora-List] Metrics for corpus "parseability"
Marina Santini
marinamailinglists at gmail.com
Sun Feb 3 10:40:45 UTC 2008
Hi Sean,
> > Are there standard or widely accepted metrics for describing the
> > well-behavedness of corpora?
>
> The answer is, I think, a resounding 'no'. There is disappointingly little
> work on systematically comparing corpora, or making objective general
> observations of one corpus in comparison to others. (Citations proving me
> wrong are most welcome. I'm aware of Sekine, Roland and Jurafsky,
> Cavaglia, also work on genre by eg Karlgren, Santini, Sharoff, which touches
> on the topic)
>
About general observations of one corpus in comparison to others,
there is a recent article (in French) about the different performance
of NLP tools applied to corpora of different genres and domains:
Marie-Paule Jacques and Nathalie Aussenac-Gilles (2006). "Variabilité
des performances des outils de TAL et genre textuel. Cas des patrons
lexico-syntaxiques". TAL. Volume 47 – n° 1/2006, pp. 11-32
Cheers, Marina
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list