[Corpora-List] spanish lemmatiser
René Venegas
rene.venegas at ucv.cl
Fri May 9 18:35:53 UTC 2008
Dear Lev,
You can use El Grial to lemmatize your corpus. If you want to do it, we can
upload the corpus and then you can make the related querys.
See www.elgrial.cl and the attached paper (in Spanish) for more information.
During the next weeks it will be under revision and some of the machines are
being changed. So you can experiment some delays or some kind of troubles.
Please, dont hesitate to contact me if necessary.
regards
Dr. René Venegas
Profesor
Programa de Postgrado en Lingüística
www.postgradolinguistica.ucv.cl/rene
www.linguistica.cl
www.elgrial.cl
Instituto de Literatura y Ciencias del Lenguaje
www.ilcl.ucv.cl
Pontificia Universidad Católica de Valparaíso
www.ucv.cl
Asistente Revista Signos. Estudios de Lingüística
www.scielo.cl/signos.htm
www.revistasignos.cl
-----Mensaje original-----
De: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] En nombre de
corpora-request at uib.no
Enviado el: viernes, 09 de mayo de 2008 9:00
Para: corpora at uib.no
Asunto: Corpora Digest, Vol 11, Issue 9
Today's Topics:
1. Spanish lemmatiser (Lev Kundin)
2. Spanish lemmatiser (Lluis Padro)
3. Spanish lemmatiser (JLDLME)
4. phonetic corpora as another means to measure language
distances (Tambovtsev: Yuri, Alina and Yuliana)
5. phonetic corpora as another means to measure language
distances (J Washtell)
6. extended deadline STEP2008 (rodolfo delmonte)
----------------------------------------------------------------------
Message: 1
Date: Thu, 08 May 2008 14:32:52 +0100
From: Lev Kundin <lev.kundin_AT_kellogg.ox.ac.uk>
Subject: [Corpora-List] Spanish lemmatiser
To: corpora_AT_uib.no
Dear corpora list members,
I need a Spanish lemmatiser for my project and I haven't been able to
find anything more decent than the Spanish Porter stemmer implemented in
perl so far (if anyone is interested =>
http://search.cpan.org/~jfraire/Lingua-Stem-Es-0.03/lib/Lingua/Stem/Es.pm).
Does anyone have a clue about some publicly available Spanish lemmatisers?
Cheers,
Lev.
/Oxford Uni MSc CompSci Student/.
------------------------------
Message: 2
Date: Thu, 08 May 2008 15:52:20 +0200
From: Lluis Padro <padro_AT_lsi.upc.edu>
Subject: [Corpora-List] Spanish lemmatiser
To: Lev Kundin <lev.kundin_AT_kellogg.ox.ac.uk>
Cc: corpora_AT_uib.no
This is a multi-part message in MIME format.
En/na Lev Kundin ha escrit:
> Dear corpora list members,
>
> I need a Spanish lemmatiser for my project and I haven't been able to
> find anything more decent than the Spanish Porter stemmer implemented in
> perl so far (if anyone is interested =>
>
http://search.cpan.org/~jfraire/Lingua-Stem-Es-0.03/lib/Lingua/Stem/Es.pm).
>
> Does anyone have a clue about some publicly available Spanish lemmatisers
You can use FreeLing Open Source suite to lemmatize and many other
things, in Spanish and a variety of other languages
http://www.lsi.upc.edu/~nlp/freeling
best
--
------------------------------------------------------------------------
*Lluís Padró*
Despatx ?-S112
Campus Nord UPC
C/ Jordi Girona 1-3
08034 Barcelona, Spain Tel: +34 934 134 015
Fax: +34 934 137 833
padro_AT_lsi.upc.edu <mailto:padro_AT_lsi.upc.es>
www.lsi.upc.edu/~padro <http://www.lsi.upc.es/%7Epadro>
------------------------------------------------------------------------
UNIVERSITAT POLITÈCNICA DE CATALUNYA
Dept. Llenguatges i Sistemes Informàtics <http://www.lsi.upc.es>
TALP <http://www.talp.upc.es> Research Center
------------------------------------------------------------------------
------------------------------
Message: 3
Date: Thu, 8 May 2008 09:27:34 -0700 (PDT)
From: JLDLME <jldlme_AT_yahoo.com>
Subject: [Corpora-List] Spanish lemmatiser
To: Lluis Padro <padro_AT_lsi.upc.edu>, Lev Kundin
<lev.kundin_AT_kellogg.ox.ac.uk>
Cc: corpora_AT_uib.no
Dear LLuis Padro,
Building one!
Best
J.L.
--- Lluis Padro <padro_AT_lsi.upc.edu> wrote:
> En/na Lev Kundin ha escrit:
> > Dear corpora list members,
> >
> > I need a Spanish lemmatiser for my project and I
> haven't been able to
> > find anything more decent than the Spanish Porter
> stemmer implemented in
> > perl so far (if anyone is interested =>
> >
>
http://search.cpan.org/~jfraire/Lingua-Stem-Es-0.03/lib/Lingua/Stem/Es.pm).
> >
> > Does anyone have a clue about some publicly
> available Spanish lemmatisers
>
> You can use FreeLing Open Source suite to
> lemmatize and many other
> things, in Spanish and a variety of other languages
>
> http://www.lsi.upc.edu/~nlp/freeling
>
> best
>
> --
>
------------------------------------------------------------------------
> *Lluís Padró*
> Despatx ?-S112
> Campus Nord UPC
> C/ Jordi Girona 1-3
> 08034 Barcelona, Spain Tel: +34 934 134 015
> Fax: +34 934 137 833
> padro_AT_lsi.upc.edu <mailto:padro_AT_lsi.upc.es>
> www.lsi.upc.edu/~padro
> <http://www.lsi.upc.es/%7Epadro>
>
------------------------------------------------------------------------
> UNIVERSITAT POLITÈCNICA DE CATALUNYA
> Dept. Llenguatges i Sistemes Informàtics
> <http://www.lsi.upc.es>
> TALP <http://www.talp.upc.es> Research Center
>
------------------------------------------------------------------------
>
> > _______________________________________________
> Corpora mailing list
> Corpora_AT_uib.no
> http://mailman.uib.no/listinfo/corpora
>
____________________________________________________________________________
________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now.
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
------------------------------
Message: 4
Date: Fri, 9 May 2008 00:25:12 +0700
From: "Tambovtsev: Yuri, Alina and Yuliana" <yutamb_AT_mail.ru>
Subject: [Corpora-List] phonetic corpora as another means to measure
language distances
To: <CORPORA_AT_uib.no>
This is a multi-part message in MIME format.
Dear Corpora colleagues, some linguists ask, why we should collect phonetic
corpora. Surely, one can find many ways to use phonetic corpora. Our task is
to use phonetic corpora as another means to compare languages and to measure
language distances. We are looking forward from linguists who'd like to join
our group. It may be one more method which can add some additional
information to classify world languages into different taxa: subgroups,
groups, families, units, unions, etc. Looking forward to hearing from you to
yutamb_AT_mail.ru Yours sincerely Yuri Tambovtsev
------------------------------
Message: 5
Date: Thu, 08 May 2008 20:13:52 +0100
From: J Washtell <lec3jrw_AT_leeds.ac.uk>
Subject: [Corpora-List] phonetic corpora as another means to measure
language distances
To: corpora_AT_uib.no
Dear Yuri,
I may be showing my ignorance here, but isn't it likely that phonetic
distance varies as much between the origin of the speaker, as between
languages themselves? English spoken in various regions of Great
Britain (which barely differ when written) is a case in point.
I have, on occasion, listening to some people from Newcastle and not
understanding the words that they were speaking (a common enough
occurrence for me, coming from Cambridge), thought that they sounded
rather like native speakers of Dutch, or perhaps one of the
Scandinavian languages. As I don't understand these languages, I would
assume that a large part of this association was to do with the
phonetic content. Similarly, on rarer occasions when I have heard
people from my own part of the country speaking and for a moment not
understood what they were saying, I have registered it as something
more like German.
I don't think it is even an issue of regional dialects: it seems to me
that the phonetic signature of any language spoken by a non-native
speaker, or spoken as a second language, is generally skewed heavily
towards the phonetic content of their native region and/or first
language.
Can I ask how these issues factor in your research? For example, are
you taking spoken corpora for each language from a very broad
cross-section of speaker-regions, or are you also measuring phonetic
distance between speaker-regions for a broad cross-section of
languages and using this as some kind of contrast?
Justin Washtell
University of Leeds
Quoting "Tambovtsev: Yuri, Alina and Yuliana" <yutamb_AT_mail.ru>:
> Dear Corpora colleagues, some linguists ask, why we should collect
> phonetic corpora. Surely, one can find many ways to use phonetic
> corpora. Our task is to use phonetic corpora as another means to
> compare languages and to measure language distances. We are looking
> forward from linguists who'd like to join our group. It may be one
> more method which can add some additional information to classify
> world languages into different taxa: subgroups, groups, families,
> units, unions, etc. Looking forward to hearing from you to
> yutamb_AT_mail.ru Yours sincerely Yuri Tambovtsev
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
------------------------------
Message: 6
Date: Fri, 9 May 2008 10:09:36 +0200
From: rodolfo delmonte <delmont_AT_unive.it>
Subject: [Corpora-List] extended deadline STEP2008
To: <corpora_AT_uib.no>
Apologies for multiple postings
----------------------------------------------------------------
3rd CALL FOR PAPERS: STEP 2008
P 2
E 0
T 0
S 8
*** EXTENDED PAPER SUBMISSION DEADLINE: MAY 18, 2008 ***
Symposium on Semantics in Text Processing
http://project.cgm.unive.it/html/STEP2008/index.htm
September 22-24, 2008
Auditorium Santa Margherita
Venice (Italy)
Endorsed by SIGSEM, the ACL special interest group
on computational semantics
MOTIVATION
============
Thanks to both statistical approaches and finite state methods,
natural language processing (NLP), particularly in the area of robust,
open-domain text processing, has made considerable progress in the
last couple of decades. It is probably fair to say that NLP tools have
reached satisfactory performance at the level of syntactic processing,
be the output structures chunks, phrase structures, or dependency
graphs. Therefore, the time seems ripe to extend the state-of-the-art
and consider deep semantic processing as a serious task in
wide-coverage NLP. This is a step that normally requires syntactic
parsing, as well as named entity recognition, anaphora resolution,
thematic role labelling and word sense disambiguation, as well as
other lower levels of processing for which reasonably good methods
have already been developed. Accurate automatic semantic
interpretation of text is expected to benefit newly emerging areas
targetting semantic and pragmatic issues, such as affectivity and
sentiment analysis of texts, textual entailment, and consistency
checking.
WORKSHOP SCOPE
================
The goal of the STEP workshop is to provide a forum for anyone active
in semantic processing of text to discuss innovative technologies,
representation issues, inference techniques, prototype
implementations, and real applications. The preferred processing
targets are large quantities of texts -- either specialised domains,
or open domains such as newswire text, blogs, and wikipedia-like text.
Implemented rather than theoretical work is emphasised in STEP.
In particular, relevant topics are:
- wide-coverage semantic/logical analysis of text
- computation and use of discourse relations
- use of lexical-conceptual and semantically related resources
- thematic role labelling in semantic representations
- word sense disambiguation in semantic representations
- implementations of specific semantic phenomena
- anaphora or ellipsis resolution in semantic representations
- implementations of sentiment analysis
- automatic detection of subjective and non-literal language
- acquisition of lexical knowledge and paraphrase from raw corpora
- background knowledge acquisition, representation, and selection
- semantic lexicons and ontologies for text interpretation
- learning semantic representations from raw text
- automated reasoning in the service of semantic analysis of text
- creation of gold standard meaning representations
- evaluation of semantic representations
- textual entailment and consistency checking
- systems that extract, represent or manipulate text meaning
- applications of semantic analysis in text processing
Applications inlude, but are not limited to, machine translation, text
understanding, question answering, summarisation, information
extraction, and the semantic web.
SHARED TASK: COMPARING SEMANTIC REPRESENTATIONS
=================================================
STEP 2008 will also feature a "shared task" to compare semantic
representations as output by state-of-the-art NLP systems.
Participating systems will be given a number of (small) texts, before
the workshop. The output of these systems will be judged on a number
of aspects by a panel of experts in the field, during the workshop.
Aim of the shared task is to discuss the feasibility of a gold
standard for deep semantic representations. Aim of the panel is to
identify a set of problematic and relevant issues for semantic
evaluation. The panel will reward the system with the most complete
and accurate semantic representation with a special prize. Important
dates for the Step Shared Task are:
Intention of participation: June 1, 2008
Shared Task paper submission: June 6, 2008
Notification of acceptance: June 23, 2008
Release of test data June 25, 2008
System's results due July 4, 2008
Final version paper due: July 25, 2008
Workshop: Sept 22-24, 2008
To participate in the shared task, submit a paper containing (1) a
system description, (2) a description of the semantic formalism used
by the system, and (3) an authentic small text and the way it is
analysed by the system. This text should be in English (but see
below) and not exceed five sentences and 120 tokens. The test data for
the shared task will be composed out of all the texts submitted by the
participants. Please email Johan Bos (bos_AT_di.uniroma1.it) by June 1 if
you intend to participate in the shared task.
Shared task submissions should follow the workshop format for regular
papers and submission guidelines (see below), and will be published in
the STEP 2008 proceedings. Please choose the category "shared task"
when submitting a paper using the EasyChair system. The final paper
must include a discussion of the system's performance on the shared
task data. Please contact Johan Bos (bos_AT_di.uniroma1.it) for further
questions on the shared task, or if you would like to participate with
a language other than English.
SUBMISSIONS
=============
Authors are invited to submit original research papers. Papers should
indicate the state of completion of the reported results. Overlap with
previously published work should be clearly indicated. Submissions
will be judged on correctness, novelty, technical strength, clarity of
presentation, significance, and relevance to the workshop.
Submissions should be in Abobe PDF format, not exceed eight A4-sized
pages, written in English and typeset in a 11 point font. Detailed
guidelines and a latex stylefile package are available at the STEP
2008 web page. Paper submission will be electronic using the EasyChair
system: http://www.easychair.org/conferences/?conf=step2008
Each submission will be reviewed by at least two members of the
programme committee. Accepted papers will be published in the workshop
proceedings. The publication of selected and revised papers is under
consideration for a special issue in a journal.
INVITED SPEAKER
=================
Harry Bunt (University of Tilburg)
IMPORTANT DATES
=================
Regular Paper submission deadline: May 18, 2008
Shared Task paper submission: June 6, 2008
Notification of acceptance: June 23, 2008
Camera-ready version due: July 25, 2008
Workshop: Sept 22-24, 2008
ORGANISING COMMITTEE
======================
Rodolfo Delmonte (Universita' Ca' Foscari, Venice)
Johan Bos (Universita' La Sapienza, Rome)
PROGRAMME COMMITTEE
=====================
Roberto Basili (University Tor Vergata, Rome, Italy)
Amedeo Cappelli (CELCT, Trento Italy)
Ann Copestake (University of Cambridge, UK)
Nicola Guarino (ISTC-CNR, Trento, Italy)
Sanda Harabagiu (HLT, University of Texas, USA)
Alexander Koller (University of Edinburgh, UK)
Leonardo Lesmo (DI, University of Tourin, Italy)
Katja Markert (University of Leeds, UK)
Dan Moldovan (HLT, University of Texas, USA)
Srini Narayanan (ICSI, Berkeley, USA)
Sergei Nirenburg (University of Maryland, USA)
Malvina Nissim (University of Bologna, Italy)
Vincenzo Pallotta (Universitaet Freiburg, Schweiz)
Emanuele Pianta (ITC, Trento, Italy)
Massimo Poesio (University of Trento, Italy)
Stephen Pulman (Oxford University, UK)
Michael Schiehlen (IMS Stuttgart, Germany)
Bonnie Webber (University of Edinburgh, UK)
----------------------------------------------------------------------
Send Corpora mailing list submissions to
corpora at uib.no
To subscribe or unsubscribe via the World Wide Web, visit
http://mailman.uib.no/listinfo/corpora
or, via email, send a message with subject or body 'help' to
corpora-request at uib.no
You can reach the person managing the list at
corpora-owner at uib.no
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Corpora digest..."
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
End of Corpora Digest, Vol 11, Issue 9
**************************************
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list