[Corpora-List] spanish lemmatiser

René Venegas rene.venegas at ucv.cl
Fri May 9 18:35:53 UTC 2008


Dear Lev,

You can use El Grial to lemmatize your corpus. If you want to do it, we can
upload the corpus and then you can make the related querys. 

See www.elgrial.cl and the attached paper (in Spanish) for more information.
During the next weeks it will be under revision and some of the machines are
being changed. So you can experiment some delays or some kind of troubles. 
Please, don’t hesitate to contact me if necessary. 

regards
Dr. René Venegas
Profesor
Programa de Postgrado en Lingüística
www.postgradolinguistica.ucv.cl/rene
www.linguistica.cl
www.elgrial.cl 
 
Instituto de Literatura y Ciencias del Lenguaje
www.ilcl.ucv.cl
 
Pontificia Universidad Católica de Valparaíso
www.ucv.cl
 
Asistente Revista Signos. Estudios de Lingüística
www.scielo.cl/signos.htm
www.revistasignos.cl
 
 
 
-----Mensaje original-----
De: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] En nombre de
corpora-request at uib.no
Enviado el: viernes, 09 de mayo de 2008 9:00
Para: corpora at uib.no
Asunto: Corpora Digest, Vol 11, Issue 9

Today's Topics:

   1.  Spanish lemmatiser (Lev Kundin)
   2.  Spanish lemmatiser (Lluis Padro)
   3.  Spanish lemmatiser (JLDLME)
   4.  phonetic corpora as another means to measure	language
      distances (Tambovtsev: Yuri, Alina and Yuliana)
   5.  phonetic corpora as another means to	measure	language
      distances (J Washtell)
   6.  extended deadline STEP2008 (rodolfo delmonte)


----------------------------------------------------------------------

Message: 1
Date: Thu, 08 May 2008 14:32:52 +0100
From: Lev Kundin <lev.kundin_AT_kellogg.ox.ac.uk>
Subject: [Corpora-List] Spanish lemmatiser
To: corpora_AT_uib.no

Dear corpora list members,

I need a Spanish lemmatiser for my project and I haven't been able to 
find anything more decent than the Spanish Porter stemmer implemented in 
perl so far (if anyone is interested => 
http://search.cpan.org/~jfraire/Lingua-Stem-Es-0.03/lib/Lingua/Stem/Es.pm).

Does anyone have a clue about some publicly available Spanish lemmatisers?

Cheers,
Lev.
/Oxford Uni MSc CompSci Student/.



------------------------------

Message: 2
Date: Thu, 08 May 2008 15:52:20 +0200
From: Lluis Padro <padro_AT_lsi.upc.edu>
Subject: [Corpora-List] Spanish lemmatiser
To: Lev Kundin <lev.kundin_AT_kellogg.ox.ac.uk>
Cc: corpora_AT_uib.no

This is a multi-part message in MIME format.


En/na Lev Kundin ha escrit:
> Dear corpora list members,
>
> I need a Spanish lemmatiser for my project and I haven't been able to 
> find anything more decent than the Spanish Porter stemmer implemented in 
> perl so far (if anyone is interested => 
>
http://search.cpan.org/~jfraire/Lingua-Stem-Es-0.03/lib/Lingua/Stem/Es.pm).
>
> Does anyone have a clue about some publicly available Spanish lemmatisers

   You can use FreeLing  Open Source suite to lemmatize and many other 
things, in Spanish and a variety of other languages
 
   http://www.lsi.upc.edu/~nlp/freeling

   best

-- 
------------------------------------------------------------------------

*Lluís Padró*
Despatx ?-S112
Campus Nord UPC
C/ Jordi Girona 1-3
08034 Barcelona, Spain 	Tel: +34 934 134 015
Fax: +34 934 137 833
padro_AT_lsi.upc.edu <mailto:padro_AT_lsi.upc.es>
www.lsi.upc.edu/~padro <http://www.lsi.upc.es/%7Epadro>
------------------------------------------------------------------------

UNIVERSITAT POLITÈCNICA DE CATALUNYA
Dept. Llenguatges i Sistemes Informàtics <http://www.lsi.upc.es>
TALP <http://www.talp.upc.es> Research Center
------------------------------------------------------------------------








------------------------------

Message: 3
Date: Thu, 8 May 2008 09:27:34 -0700 (PDT)
From: JLDLME <jldlme_AT_yahoo.com>
Subject: [Corpora-List] Spanish lemmatiser
To: Lluis Padro <padro_AT_lsi.upc.edu>,	Lev Kundin
	<lev.kundin_AT_kellogg.ox.ac.uk>
Cc: corpora_AT_uib.no

Dear LLuis Padro,

Building one! 

Best

J.L.
--- Lluis Padro <padro_AT_lsi.upc.edu> wrote:

> En/na Lev Kundin ha escrit:
> > Dear corpora list members,
> >
> > I need a Spanish lemmatiser for my project and I
> haven't been able to 
> > find anything more decent than the Spanish Porter
> stemmer implemented in 
> > perl so far (if anyone is interested => 
> >
>
http://search.cpan.org/~jfraire/Lingua-Stem-Es-0.03/lib/Lingua/Stem/Es.pm).
> >
> > Does anyone have a clue about some publicly
> available Spanish lemmatisers
> 
>    You can use FreeLing  Open Source suite to
> lemmatize and many other 
> things, in Spanish and a variety of other languages
>  
>    http://www.lsi.upc.edu/~nlp/freeling
> 
>    best
> 
> -- 
>
------------------------------------------------------------------------

> *Lluís Padró*
> Despatx ?-S112
> Campus Nord UPC
> C/ Jordi Girona 1-3
> 08034 Barcelona, Spain 	Tel: +34 934 134 015
> Fax: +34 934 137 833
> padro_AT_lsi.upc.edu <mailto:padro_AT_lsi.upc.es>
> www.lsi.upc.edu/~padro
> <http://www.lsi.upc.es/%7Epadro>
>
------------------------------------------------------------------------

> UNIVERSITAT POLITÈCNICA DE CATALUNYA
> Dept. Llenguatges i Sistemes Informàtics
> <http://www.lsi.upc.es>
> TALP <http://www.talp.upc.es> Research Center
>
------------------------------------------------------------------------

> 
> > _______________________________________________

> Corpora mailing list
> Corpora_AT_uib.no
> http://mailman.uib.no/listinfo/corpora
> 



 
____________________________________________________________________________
________

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



------------------------------

Message: 4
Date: Fri, 9 May 2008 00:25:12 +0700
From: "Tambovtsev: Yuri, Alina and Yuliana" <yutamb_AT_mail.ru>
Subject: [Corpora-List] phonetic corpora as another means to measure
	language distances
To: <CORPORA_AT_uib.no>

This is a multi-part message in MIME format.



Dear Corpora colleagues, some linguists ask, why we should collect phonetic
corpora. Surely, one can find many ways to use phonetic corpora. Our task is
to use phonetic corpora as another means to compare languages and to measure
language distances. We are looking forward from linguists who'd like to join
our group. It may be one more method which can add some additional
information to classify world languages into different taxa: subgroups,
groups, families, units, unions, etc. Looking forward to hearing from you to
yutamb_AT_mail.ru  Yours sincerely Yuri Tambovtsev






------------------------------

Message: 5
Date: Thu, 08 May 2008 20:13:52 +0100
From: J Washtell <lec3jrw_AT_leeds.ac.uk>
Subject: [Corpora-List] phonetic corpora as another means to	measure
	language distances
To: corpora_AT_uib.no

Dear Yuri,

I may be showing my ignorance here, but isn't it likely that phonetic  
distance varies as much between the origin of the speaker, as between  
languages themselves? English spoken in various regions of Great  
Britain (which barely differ when written) is a case in point.

I have, on occasion, listening to some people from Newcastle and not  
understanding the words that they were speaking (a common enough  
occurrence for me, coming from Cambridge), thought that they sounded  
rather like native speakers of Dutch, or perhaps one of the  
Scandinavian languages. As I don't understand these languages, I would  
assume that a large part of this association was to do with the  
phonetic content. Similarly, on rarer occasions when I have heard  
people from my own part of the country speaking and for a moment not  
understood what they were saying, I have registered it as something  
more like German.

I don't think it is even an issue of regional dialects: it seems to me  
that the phonetic signature of any language spoken by a non-native  
speaker, or spoken as a second language, is generally skewed heavily  
towards the phonetic content of their native region and/or first  
language.

Can I ask how these issues factor in your research? For example, are  
you taking spoken corpora for each language from a very broad  
cross-section of speaker-regions, or are you also measuring phonetic  
distance between speaker-regions for a broad cross-section of  
languages and using this as some kind of contrast?

Justin Washtell
University of Leeds


Quoting "Tambovtsev: Yuri, Alina and Yuliana" <yutamb_AT_mail.ru>:

> Dear Corpora colleagues, some linguists ask, why we should collect   
> phonetic corpora. Surely, one can find many ways to use phonetic   
> corpora. Our task is to use phonetic corpora as another means to   
> compare languages and to measure language distances. We are looking   
> forward from linguists who'd like to join our group. It may be one   
> more method which can add some additional information to classify   
> world languages into different taxa: subgroups, groups, families,   
> units, unions, etc. Looking forward to hearing from you to   
> yutamb_AT_mail.ru  Yours sincerely Yuri Tambovtsev



----------------------------------------------------------------

This message was sent using IMP, the Internet Messaging Program.




------------------------------

Message: 6
Date: Fri, 9 May 2008 10:09:36 +0200
From: rodolfo delmonte <delmont_AT_unive.it>
Subject: [Corpora-List] extended deadline STEP2008
To: <corpora_AT_uib.no>

Apologies for multiple postings

  ----------------------------------------------------------------


                    3rd CALL FOR PAPERS: STEP 2008

                                P  2
                              E      0
                            T          0
                          S              8

        *** EXTENDED PAPER SUBMISSION DEADLINE: MAY 18, 2008 ***

               Symposium on Semantics in Text Processing

          http://project.cgm.unive.it/html/STEP2008/index.htm

                         September 22-24, 2008

                     Auditorium Santa Margherita
                           Venice (Italy)

          Endorsed by SIGSEM, the ACL special interest group
                    on computational semantics


  MOTIVATION
============
Thanks to both statistical approaches and finite state methods,
natural language processing (NLP), particularly in the area of robust,
open-domain text processing, has made considerable progress in the
last couple of decades. It is probably fair to say that NLP tools have
reached satisfactory performance at the level of syntactic processing,
be the output structures chunks, phrase structures, or dependency
graphs.  Therefore, the time seems ripe to extend the state-of-the-art
and consider deep semantic processing as a serious task in
wide-coverage NLP. This is a step that normally requires syntactic
parsing, as well as named entity recognition, anaphora resolution,
thematic role labelling and word sense disambiguation, as well as
other lower levels of processing for which reasonably good methods
have already been developed. Accurate automatic semantic
interpretation of text is expected to benefit newly emerging areas
targetting semantic and pragmatic issues, such as affectivity and
sentiment analysis of texts, textual entailment, and consistency
checking.


  WORKSHOP SCOPE
================
The goal of the STEP workshop is to provide a forum for anyone active
in semantic processing of text to discuss innovative technologies,
representation issues, inference techniques, prototype
implementations, and real applications. The preferred processing
targets are large quantities of texts -- either specialised domains,
or open domains such as newswire text, blogs, and wikipedia-like text.
Implemented rather than theoretical work is emphasised in STEP.
In particular, relevant topics are:

- wide-coverage semantic/logical analysis of text
- computation and use of discourse relations
- use of lexical-conceptual and semantically related resources
- thematic role labelling in semantic representations
- word sense disambiguation in semantic representations
- implementations of specific semantic phenomena
- anaphora or ellipsis resolution in semantic representations
- implementations of sentiment analysis
- automatic detection of subjective and non-literal language
- acquisition of lexical knowledge and paraphrase from raw corpora
- background knowledge acquisition, representation, and selection
- semantic lexicons and ontologies for text interpretation
- learning semantic representations from raw text
- automated reasoning in the service of semantic analysis of text
- creation of gold standard meaning representations
- evaluation of semantic representations
- textual entailment and consistency checking
- systems that extract, represent or manipulate text meaning
- applications of semantic analysis in text processing

Applications inlude, but are not limited to, machine translation, text
understanding, question answering, summarisation, information
extraction, and the semantic web.


  SHARED TASK: COMPARING SEMANTIC REPRESENTATIONS
=================================================
STEP 2008 will also feature a "shared task" to compare semantic
representations as output by state-of-the-art NLP systems.
Participating systems will be given a number of (small) texts, before
the workshop.  The output of these systems will be judged on a number
of aspects by a panel of experts in the field, during the workshop.
Aim of the shared task is to discuss the feasibility of a gold
standard for deep semantic representations. Aim of the panel is to
identify a set of problematic and relevant issues for semantic
evaluation.  The panel will reward the system with the most complete
and accurate semantic representation with a special prize. Important
dates for the Step Shared Task are:

	Intention of participation:       June  1, 2008
	Shared Task paper submission:     June  6, 2008
         Notification of acceptance:       June 23, 2008
         Release of test data              June 25, 2008
         System's results due              July  4, 2008
         Final version paper due:          July 25, 2008
         Workshop:                      Sept 22-24, 2008

To participate in the shared task, submit a paper containing (1) a
system description, (2) a description of the semantic formalism used
by the system, and (3) an authentic small text and the way it is
analysed by the system.  This text should be in English (but see
below) and not exceed five sentences and 120 tokens. The test data for
the shared task will be composed out of all the texts submitted by the
participants. Please email Johan Bos (bos_AT_di.uniroma1.it) by June 1 if
you intend to participate in the shared task.

Shared task submissions should follow the workshop format for regular
papers and submission guidelines (see below), and will be published in
the STEP 2008 proceedings. Please choose the category "shared task"
when submitting a paper using the EasyChair system. The final paper
must include a discussion of the system's performance on the shared
task data.  Please contact Johan Bos (bos_AT_di.uniroma1.it) for further
questions on the shared task, or if you would like to participate with
a language other than English.


  SUBMISSIONS
=============
Authors are invited to submit original research papers. Papers should
indicate the state of completion of the reported results. Overlap with
previously published work should be clearly indicated.  Submissions
will be judged on correctness, novelty, technical strength, clarity of
presentation, significance, and relevance to the workshop.

Submissions should be in Abobe PDF format, not exceed eight A4-sized
pages, written in English and typeset in a 11 point font.  Detailed
guidelines and a latex stylefile package are available at the STEP
2008 web page. Paper submission will be electronic using the EasyChair
system: http://www.easychair.org/conferences/?conf=step2008

Each submission will be reviewed by at least two members of the
programme committee. Accepted papers will be published in the workshop
proceedings. The publication of selected and revised papers is under
consideration for a special issue in a journal.


  INVITED SPEAKER
=================
         Harry Bunt (University of Tilburg)


  IMPORTANT DATES
=================
         Regular Paper submission deadline: May 18, 2008
	Shared Task paper submission:      June 6, 2008
         Notification of acceptance:       June 23, 2008
         Camera-ready version due:         July 25, 2008
         Workshop:                      Sept 22-24, 2008


  ORGANISING COMMITTEE
======================
         Rodolfo Delmonte (Universita' Ca' Foscari, Venice)
         Johan Bos (Universita' La Sapienza, Rome)


  PROGRAMME COMMITTEE
=====================
         Roberto Basili (University Tor Vergata, Rome, Italy)
         Amedeo Cappelli (CELCT, Trento Italy)
         Ann Copestake (University of Cambridge, UK)
         Nicola Guarino (ISTC-CNR, Trento, Italy)
         Sanda Harabagiu (HLT, University of Texas, USA)
         Alexander Koller (University of Edinburgh, UK)
         Leonardo Lesmo (DI, University of Tourin, Italy)
         Katja Markert (University of Leeds, UK)
         Dan Moldovan (HLT, University of Texas, USA)
         Srini Narayanan (ICSI, Berkeley, USA)
         Sergei Nirenburg (University of Maryland, USA)
         Malvina Nissim (University of Bologna, Italy)
         Vincenzo Pallotta (Universitaet Freiburg, Schweiz)
         Emanuele Pianta (ITC, Trento, Italy)
         Massimo Poesio (University of Trento, Italy)
         Stephen Pulman (Oxford University, UK)
         Michael Schiehlen (IMS Stuttgart, Germany)
         Bonnie Webber (University of Edinburgh, UK)




----------------------------------------------------------------------
Send Corpora mailing list submissions to
	corpora at uib.no

To subscribe or unsubscribe via the World Wide Web, visit
	http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
	corpora-request at uib.no

You can reach the person managing the list at
	corpora-owner at uib.no

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


End of Corpora Digest, Vol 11, Issue 9
**************************************


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list