[Corpora-List] French corpora for POS tagger evaluation

Khalid Choukri choukri at elda.org
Fri Feb 15 12:45:40 UTC 2013


Dear Austina
You may want to have a look at the Easy corpus, it is distributed as an evaluation package for syntactic analysis but can be used for other purposes.

Details: http://catalog.elra.info/product_info.php?products_id=1112&language=en


Here is a quick description:
The EASy Evaluation Package was produced within the French national project EASy (Evaluation of syntactic parsers of French), as part of the Technolangue programme funded by the French Ministry of Research and New Technologies (MRNT). The project enabled to carry out a campaign for the evaluation of syntactic parsers of French. 
Here is a quick description:

A collection of syntactically tagged French texts gathered over 6 domains (about one million words) : 
-	medicine: 100,000 words, including 5,000 annotated words, 
-	literature: 150,000 words, including 15,000 annotated words, 
-	emails: 2,250 anonymised personal emails (121,000 words), 
-	general: 250,000 words, including 24,000 annotated words, extracted from Le Monde newspaper, reports from the French Senate and the European Assembly (MLCC, MultiLingual Corpora for Co-operation, catalogue ref: ELRA-W0023),
-	speech: 10 passages of transcribed dialogues from the Spoken French corpus (8,000 annotated words),
-	questions: corpus of 137,000 words, extracted from the TREC and AMARYLLIS campaigns, including 5,000 annotated words. 
2)	PASTK++: gathers evaluation tools for constituents and relations. It includes a version of the EASy campaign tools that were modified during the PASSAGE campaign (which followed the EASy campaigns).
3)	Visualization tools for constituents and relations 

Cordialement / Best regards
Khalid Choukri
(short message sent from IPad / message Court envoyé d'un IPad)


Le 14 févr. 2013 à 21:21, Olivier Austina <olivier.austina at gmail.com> a écrit :

> Hello,
> 
> I am looking for a standard French corpora for POS tagger evaluation. Where can I download the corpus. Thanks.
> 
> -- 
> Regards
> Austina
> 
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20130215/e75b0d15/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list