<html><body>INFOLING. Información global sobre lingüística

hispánica: <a href="http://infoling.org"

target="_blank">http://infoling.org/</a>

<br />

<br />Moderadores: Carlos Subirats (UAB), Mar Cruz (UB)

<br />Editoras: Paloma Garrido (U. Rey Juan Carlos), Laura Romero

(UB)

<br />Programación y desarrollo: Marc Ortega (UAB)

<br />Directoras de reseñas: Alexandra Álvarez (U. Los Andes,

Venezuela), Yvette Bürki (U. Bern), María Luisa Calero (U. Córdoba,

España)

<br />Asesores: Isabel Verdaguer (UB), Gerd Wotjak (U. Leipzig)

<br />Colaboradores: Antonio Ríos (UAB), Danica Salazar (UB)

<br />

<br />Con el apoyo de:

<br /><ul style="margin: 0;padding-left:15px;"><li

style="padding-bottom: 0px;padding-top:0px;">Editorial Octaedro: <a

href="http://www.octaedro.com/"

target="_blank">http://www.octaedro.com/</a>

<br /><li style=";padding-top:0px;padding-bottom: 0px;">Arco Libros:

<a href="http://www.arcomuralla.com/Arco/Shop/default.asp"

target="_blank">http://www.arcomuralla.com/Arco/Shop/default.asp</a></li></ul>

<br /><font style="font-size:90%">ISSN: 1576-3404 </font>

<br /><font style="font-size:90%">© Infoling 1996-2010. Reservados

todos los derechos</font>

<br />

<br /></br><hr /><b>Recursos lingüísticos: </b><br />Corpus

español de dominio público de 120 millones de palabras<br

/><b>URL:</b> <a href="http://www.lsi.upc.edu/~nlp/wikicorpus/"

target="_blank">http://www.lsi.upc.edu/~nlp/wikicorpus/</a><br

/><b>Información de:</b> Infoling List

<infoling@infoling.org><br /><hr /><br

/><b>Descripción</b><br /><p> Wikicorpus, v. 1.0: Spanish,

English, and Catalan portions of the Wikipedia.<br /><br />The

Wikicorpus is a trilingual corpus (Spanish, English, Catalan) that

contains large portions of the Wikipedia (based on a 2006 dump) and

has been automatically enriched with linguistic information. In its

present version, it contains over 750 million words.<br /><br />The

corpora have been annotated with lemma and part of speech information

using the open source library FreeLing. Also, they have been sense

annotated with the state of the art Word Sense Disambiguation

algorithm UKB. As UKB assigns WordNet senses, and WordNet has been

aligned across languages via the InterLingual Index, this sort of

annotation opens the way to massive explorations in lexical semantics

that were not possible before.<br /><br />Moreover, we also provide an

open source Java-based parser for Wikipedia pages developed for the

construction of the corpus.</p><br /><b>Área temática:</b>

Lingüística de corpus<br /><br /><b>Información en la web de

Infoling:</b><br /> <a

href="http://www.infoling.org/informacion/RecursoL29.html"

target="_blank">

http://www.infoling.org/informacion/RecursoL29.html</a></body></html>