[Corpora-List] Corpora Digest, Vol 5, Issue 27

René Venegas rene.venegas at ucv.cl
Wed Nov 28 14:33:00 UTC 2007


Dear Mario

You will find a tagged Spanish Corpus in www.elgrial.cl, you can make
morphosyntatic querys and it is free to use for research purposes.

Dr. René Venegas
Profesor
Programa de Postgrado en Lingüística
www.postgradolinguistica.ucv.cl/rene
www.linguistica.cl
www.elgrial.cl
 
Instituto de Literatura y Ciencias del Lenguaje
www.ilcl.ucv.cl
 
Pontificia Universidad Católica de Valparaíso
www.ucv.cl
 
Asistente Revista Signos. Estudios de Lingüística
www.scielo.cl/signos.htm
www.revistasignos.cl
 
 
 
-----Mensaje original-----
De: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] En nombre de
corpora-request at uib.no
Enviado el: miércoles, 28 de noviembre de 2007 11:00
Para: corpora at uib.no
Asunto: Corpora Digest, Vol 5, Issue 27

Today's Topics:

   1.  frequency dictionary of verbs (Jesús Fernández)
   2.  Spanish corpus (Mario Crespo Miguel)
   3.  Spanish corpus (Valerie Mapelli)
   4.  DGT-TM - Translation Memory for 231 language pairs	available
      for distribution (Ralf Steinberger)


----------------------------------------------------------------------

Message: 1
Date: Tue, 27 Nov 2007 15:27:05 +0100
From: "Jesús Fernández" <jesusferdom_AT_gmail.com>
Subject: [Corpora-List] frequency dictionary of verbs
To: CORPORA_AT_uib.no



Dear David, Adam, Jennifer and Suzan,

Thank you so much for your replies, you have got straight to the point even
if I was not too specific on the requirements.

Below are the links which you have provided in case someone else is
interested:

- http://ota.ahds.ac.uk/ <http://ota.ahds.ac.uk/> (The Oxford Text Archive)
- <http://www.sketchengine.co.uk/> http://www.sketchengine.co.uk (generation
of frequency lists from English corpora)
-
http://www.comp.lancs.ac.uk/ucrel/bncfreq/lists/5_2_all_rank_verb.txt(freque
ncy
list of verbs by lemma, from Leech, Rayson & Wilson 2001)

Best,
Jesús.






------------------------------

Message: 2
Date: Wed, 28 Nov 2007 09:54:58 +0100 (CET)
From: Mario Crespo Miguel <mario.crespo_AT_uca.es>
Subject: [Corpora-List] Spanish corpus
To: CORPORA_AT_UIB.NO

Dear all,

I wonder if anyone on the list knows if there is available a
syntactically and/or morphologically tagged corpus of Spanish that 
could be purchased or obtained for research purposes. Thank you 
very much in advance,

best

Mario Crespo Miguel




------------------------------

Message: 3
Date: Wed, 28 Nov 2007 10:03:03 +0100
From: Valerie Mapelli <mapelli_AT_elda.org>
Subject: [Corpora-List] Spanish corpus
To: Mario Crespo Miguel <mario.crespo_AT_uca.es>,CORPORA_AT_UIB.NO

Dear Mario,

You may be interested in the MULTEXT JOC Corpus which includes 
morpho-syntactic annotation available on the ELRA catalogue: 
http://catalog.elra.info/product_info.php?products_id=534

The CRATER Corpus could also suit your needs: 
http://catalog.elra.info/product_info.php?products_id=84

Please do not hesitate to contact me for any further information.

Best regards,

Valerie Mapelli


At 09:54 28/11/2007, Mario Crespo Miguel wrote:
>Dear all,
>
>I wonder if anyone on the list knows if there is available a
>syntactically and/or morphologically tagged corpus of Spanish that
>could be purchased or obtained for research purposes. Thank you
>very much in advance,
>
>best
>
>Mario Crespo Miguel
>
>
>_______________________________________________

>Corpora mailing list
>Corpora_AT_uib.no
>http://mailman.uib.no/listinfo/corpora




------------------------------

Message: 4
Date: Wed, 28 Nov 2007 14:48:09 +0100
From: Ralf Steinberger <ralf.steinberger_AT_jrc.it>
Subject: [Corpora-List] DGT-TM - Translation Memory for 231 language
	pairs	available for distribution
To: CORPORA_AT_uib.no

This is a multi-part message in MIME format.



Apologies for cross-postings.

 

   DGT-TM Translation Memory

   Freely available

   22 languages

   231 language pairs

   Format: TMX version 1

    <http://langtech.jrc.it/DGT-TM.html> http://langtech.jrc.it/DGT-TM.html

 

 

The European Commission's Directorate General for Translation (DGT) and the
Joint Research Centre (JRC) have made available a multilingual Translation
Memory (sentences and their translations, in standard TMX format) for the 22
official European Union languages Bulgarian, Czech, Danish, Dutch, English,
Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian,
Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish
and Swedish.

 

This release follows the public release - in May 2006 - of the
<http://langtech.jrc.it/JRC-Acquis.html> JRC-Acquis multilingual parallel
corpus with sentence alignment for 231 language pairs and a total size of
over 1 Billion words.

 

The data releases of DGT and JRC are in line with the general effort of the
European Commission to support multilingualism, language diversity and the
re-use of Commission information. 

 

The Translation Memory contains most, but not all of the Acquis
Communautaire, which is the entire body of European legislation, including
all the treaties, regulations and directives adopted by the European Union
(EU) and the rulings of the European Court of Justice. Since each new
country joining the EU is required to accept the whole Acquis Communautaire,
this body of legislation is translated into 22 official EU languages. For
the 23rd official EU language, Irish, the Acquis is not translated on a
regular basis.

 

A translation memory is a collection of small text segments and their
translation. These segments can be sentences or sentence parts. Translation
memories are used to support translators by ensuring that pieces of text
that have already been translated do not need to be translated again. 

 

Both translation memories and parallel texts are an important linguistic
resource that can be used for a variety of purposes, including:

 

training automatic systems for Statistical Machine Translation (SMT); 

producing monolingual or multilingual lexical and semantic resources such as
dictionaries and ontologies; 

training and testing multilingual information extraction software; 

checking translation consistency automatically; 

testing and benchmarking alignment software (for sentences, words, etc.). 

For usage conditions, details regarding the difference between
<http://langtech.jrc.it/DGT-TM.html> DGT-TM and the
<http://langtech.jrc.it/JRC-Acquis.html> JRC-Acquis, size information,
downloading instructions, etc. go to  <http://langtech.jrc.it/DGT-TM.html>
http://langtech.jrc.it/DGT-TM.html. 

 

 

Achim Blatt

Directorate General for Translation (DGT)

Unit DGT.R.3 Informatics ( <http://ec.europa.eu/dgs/translation/>
http://ec.europa.eu/dgs/translation/)

 

Ralf Steinberger 
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it) 

 

 

The JRC's Language Technology group specialises in the development of highly
multilingual text analysis tools and in cross-lingual applications. Many
applications are accessible online, e.g.:

        <http://press.jrc.it/NewsExplorer/> NewsExplorer: multilingual news
aggregation and analysis (19 languages); allows to navigate the news over
time and across languages; trend analysis; collects information about people
from the news; social network detection.

        <http://press.jrc.it/> NewsBrief: breaking news detection and
display of the very latest thematic news from around the world; email
alerting (22+ languages).

        <http://medusa.jrc.it/> MedISys Medical Information System: latest
health-related news from around the world according to themes and diseases
(22+ languages).

 

 







----------------------------------------------------------------------
Send Corpora mailing list submissions to
	corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
	corpora-request at uib.no

You can reach the person managing the list at
	corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


End of Corpora Digest, Vol 5, Issue 27
**************************************


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list