[Corpora-List] corpus for Spanish and French language

Paul McNamee paulmac at nautilus.jhuapl.edu
Fri Jun 6 15:30:21 UTC 2003


You should particularly look at the Cross-Language Evaluation Forum (CLEF)
project.  The CLEF program has been ongoing for ~4 years and has
developed a re-usable test suite of IR corpora in eight or so European
languages, including Spanish (~460k docs) and French (~120k docs), that
I believe can be made available without fee, subject to user agreements.
Information about CLEF can be found at http://www.clef-campaign.org/
and the site contains contact information for the project director,
Carol Peters.

Best regards,

- Paul McNamee

Research and Technology Development Center
Johns Hopkins University Applied Physics Lab
11100 Johns Hopkins Road
Laurel MD  20723-6099   USA
Voice: +1 443 778 3816
Fax:   +1 443 778 6904
Email: mcnamee at jhuapl.edu





On Wed, 4 Jun 2003, Ying Ding wrote:

> Dear All,
>
> We have a small project running here related to search engine. We need to
> test this search engine in Spanish and French language. We would need some
> corpus for these two languages. Do you know where to get it for free or
> with little cost.
>
> Another thing is the stop word lists for these two language. Do you know
> where to find such stop word list.
>
> Any help will be highly appreciated! I will provide the summary at the end.
>
> Best Regards
> ying
>
> Dr. Ying Ding
> Assistant Professor
> Next Web Generation Group
> Institute of Computer Science, University of Innsbruck
> Technikerstr. 13, A-6020 Innsbruck, Austria
> Tel:  +43 512 507 6112, Fax: +43 512 507 9872
> http://www.nextwebgeneration.com/
>
>
>
>



More information about the Corpora mailing list