<span class="internal-link"><a class="internal-link" href="http://sfn.uab.es:9080/SFN/tools/cea/spanish"></a></span><b><a href="http://sfncorpora.uab.es/CQPweb/cea/" target="_blank">Corpus del Español Actual (CEA) / </a><a href="http://sfncorpora.uab.es/CQPweb/cea/" target="_blank">The Corpus of Contemporary Spanish</a></b> (Powered by CQPweb)<br>
<p style="text-align:justify">The <b><a target="_blank" class="external-link" href="http://sfncorpora.uab.es/CQPweb/cea/"><span class="external-link">Corpus del Español Actual</span></a></b><span class="external-link"><span class="external-link"> (the Corpus of Contemporary Spanish)</span></span> contains <b>540 million words</b>, which have been lemmatized and tagged with detailed part-of-speech information. The CEA is made up of the following texts:</p>
<ul style="text-align:justify"><li>The Spanish part of the eleven-language parallel corpus <a href="http://www.statmt.org/europarl/" target="_blank">Europarl: European Parliament Proceedings Parallel Corpus, v. 6</a> (1996-2010);</li>
<li>The Spanish portion of the trilingual<a href="http://www.lsi.upc.edu/%7Enlp/wikicorpus/" target="_blank"> Wikicorpus, v. 1.0</a>, which was extracted from a snapshot of Wikipedia (2006); and</li><li>The Spanish part of the seven-language parallel corpus <a target="_blank" class="external-link" href="http://www.euromatrixplus.net/multi-un/">MultiUN: Multilingual UN Parallel Text 2000-2009</a>, a corpus made up of the resolutions of the United Nations.</li>
</ul><p style="text-align:justify">The CEA was tagged using an <a class="internal-link" href="http://sfn.uab.es:9080/SFN/tools/dictionary" target="_blank"><span class="internal-link">online Spanish dictionary</span></a> containing 635,000 wordforms, which was automatically generated from a dictionary of 86,000 single-word lemmas (e.g., <i>unir</i>,<i> inmoralidad</i>,<i> allí</i>) and 26,000 multiword lemmas (e.g., <i>muerte cerebral</i>,<i> carga de profundidad</i>, <i>de armas tomar</i>)<i> </i>(Subirats
1989, 1992, 1994a, 1994b; Mogorrón 1994; Garrido 1999; Bobes 2000).
Tag disambiguation was carried out with intersecting finite-state
automata using lexical and syntactic information (Subirats 1998,
Subirats and Ortega 2000, 2001, Ortega in progress).</p>
<p style="text-align:justify"><b>Searching the CEA:</b></p>
<p style="text-align:justify">The query interface for the CEA is <a target="_blank" href="http://cwb.sourceforge.net/cqpweb.php">CQPweb</a>, which uses some of the components of the <a href="http://cwb.sourceforge.net/" style="padding-left:0px" target="_blank">IMS Open Corpus Workbench (CWB)</a>,
a set of open-source tools for managing and searching large corpora --
including the Corpus Query Processor (CQP). To learn more about how to
use CQPweb, you can consult the IMS's brief description of the <a target="_blank" class="external-link" href="http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPSyntax.html">regular-expression syntax</a><span class="external-link"> used by the CQP and their list of </span><a target="_blank" class="external-link" href="http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPExamples.html">sample queries</a>.
If you wish to define your query in terms of grammatical and
inflectional categories, you can use the part-of-speech tags listed on
the CEA's <a class="internal-link" href="http://sfn.uab.es:9080/SFN/tools/cea/corpus-tags" target="_blank">Corpus Tags</a> page.</p>