[Corpora-List] WBR-99 collection

Costa Luis Fernando Luis.Costa at sintef.no
Tue Dec 16 13:52:42 UTC 2003


Dear colleagues,

Linguateca is pleased to announce the public availability of the WBR-99
collection.

The collection was built from a set of documents collected from the
Brazilian Web in November 1999. It was taken from the database of TodoBR, a
search engine for the Brazilian Web, and offered to the LATIN laboratory,
for research in Information Retrieval problems. Experiments with the WBR-99
collection have already been used in several doctorate and master thesis,
and published works.
The collection contains about 6 million HTML documents in an already indexed
format. It also contains the complete set of queries submitted to TodoBR
during November 1999. For fifty of those queries a set of relevant documents
is available.

For more information and to obtain a password to access the Collection check
the following adress, which pretends to be an "electronic bookshelf" for
everyone who wants his works related to the computational processing of the
portuguese language available in the web: 

http://www.linguateca.pt/Repositorio/

We are grateful to Berthier Ribeiro-Neto, Nivio Ziviani and Pável Calado for
the the autorization to the public availability of this resource. Thanks
also to FCCN for providing the server where the resource is installed.

Best Regards,
Luís

************************************************************************
Luís Costa				Linguateca
    
SINTEF Telecom & Informatics	Tel. (directo) +47 22 06 73 11
Forskningsveien 1			Tel. +47 22 06 73 00
Box 124 Blindern			Fax. +47 22 06 73 50
N-0314 Oslo				Email: luis.costa at sintef.no
Noruega				http://www.linguateca.pt/
************************************************************************* 



More information about the Corpora mailing list