<html><body>INFOLING. Información global sobre lingüística
hispánica: <a href="http://infoling.org"
target="_blank">http://infoling.org/</a>
<br />
<br />Moderadores: Carlos Subirats (UAB), Mar Cruz (UB)
<br />Editoras: Paloma Garrido (U. Rey Juan Carlos), Laura Romero
(UB)
<br />Programación y desarrollo: Marc Ortega (UAB)
<br />Directoras de reseñas: Alexandra Álvarez (U. Los Andes,
Venezuela), Yvette Bürki (U. Bern), María Luisa Calero (U. Córdoba,
España)
<br />Asesores: Isabel Verdaguer (UB), Gerd Wotjak (U. Leipzig)
<br />Colaboradores: Antonio Ríos (UAB), Danica Salazar (UB)
<br />
<br />Con el apoyo de:
<br /><ul style="margin: 0;padding-left:15px;"><li
style="padding-bottom: 0px;padding-top:0px;">Editorial Octaedro: <a
href="http://www.octaedro.com/"
target="_blank">http://www.octaedro.com/</a>
<br /><li style=";padding-top:0px;padding-bottom: 0px;">Arco Libros:
<a href="http://www.arcomuralla.com/Arco/Shop/default.asp"
target="_blank">http://www.arcomuralla.com/Arco/Shop/default.asp</a></li></ul>
<br /><font style="font-size:90%">ISSN: 1576-3404 </font>
<br /><font style="font-size:90%">© Infoling 1996-2010. Reservados
todos los derechos</font>
<br />
<br /></br><hr /><b>Recursos lingüísticos: </b><br />Corpus
español de dominio público de 120 millones de palabras<br
/><b>URL:</b> <a href="http://www.lsi.upc.edu/~nlp/wikicorpus/"
target="_blank">http://www.lsi.upc.edu/~nlp/wikicorpus/</a><br
/><b>Información de:</b> Infoling List
<infoling@infoling.org><br /><hr /><br
/><b>Descripción</b><br /><p> Wikicorpus, v. 1.0: Spanish,
English, and Catalan portions of the Wikipedia.<br /><br />The
Wikicorpus is a trilingual corpus (Spanish, English, Catalan) that
contains large portions of the Wikipedia (based on a 2006 dump) and
has been automatically enriched with linguistic information. In its
present version, it contains over 750 million words.<br /><br />The
corpora have been annotated with lemma and part of speech information
using the open source library FreeLing. Also, they have been sense
annotated with the state of the art Word Sense Disambiguation
algorithm UKB. As UKB assigns WordNet senses, and WordNet has been
aligned across languages via the InterLingual Index, this sort of
annotation opens the way to massive explorations in lexical semantics
that were not possible before.<br /><br />Moreover, we also provide an
open source Java-based parser for Wikipedia pages developed for the
construction of the corpus.</p><br /><b>Área temática:</b>
Lingüística de corpus<br /><br /><b>Información en la web de
Infoling:</b><br /> <a
href="http://www.infoling.org/informacion/RecursoL29.html"
target="_blank">
http://www.infoling.org/informacion/RecursoL29.html</a></body></html>