31.1243, Calls: Computational Linguistics/Spain

The LINGUIST List linguist at listserv.linguistlist.org
Thu Apr 2 16:00:07 UTC 2020


LINGUIST List: Vol-31-1243. Thu Apr 02 2020. ISSN: 1069 - 4875.

Subject: 31.1243, Calls: Computational Linguistics/Spain

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Peace Han, Nils Hjortnaes, Yiwen Zhang, Julian Dietrich
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Lauren Perkins <lauren at linguistlist.org>
================================================================


Date: Thu, 02 Apr 2020 11:58:57
From: Luis Espinosa-Anke [espinosa-ankel at cardiff.ac.uk]
Subject: CAPITEL-EVAL 2020

 
Full Title: CAPITEL-EVAL 2020 
Short Title: CAPITEL 

Date: 22-Sep-2020 - 25-Sep-2020
Location: Málaga, Spain 
Contact Person: Jordi Porta
Meeting Email: porta at rae.es
Web Site: https://sites.google.com/view/capitel2020 

Linguistic Field(s): Computational Linguistics 

Call Deadline: 17-May-2020 

Meeting Description:

Within the framework of the PlanTL, the Royal Spanish Academy (RAE) and the
Secretariat of State for Digital Advancement (SEAD) of the Ministry of Economy
signed an agreement for developing a linguistically annotated corpus of
Spanish news articles, aimed at expanding the language resource infrastructure
for the Spanish language. The name of such corpus is CAPITEL (Corpus del Plan
de Impulso a las Tecnologías del Lenguaje}, and is composed of contemporary
news articles thanks to agreements with a number of news media providers.
CAPITEL has three levels of linguistic annotation: morphosyntactic (with
lemmas and Universal Dependencies-style POS tags and features), syntactic
(following Universal Dependencies v2), and named entities. 

The linguistic annotation of a subset of the CAPITEL corpus has been revised
using a  machine-annotation-followed-by-human-revision procedure. Manual
revision has been carried out by a team of graduated linguists using the
Annotation Guidelines created specifically for CAPITEL. The named entity and
syntactic layers of revised annotations comprise about 1 million words for the
former, and roughly 250,000 for the latter.  Due to the size of the corpus and
the nature of the annotations, we propose two IberLEF sub-tasks under the more
general, umbrella task of CAPITEL @ IberLEF 2020, where we will use the
revised subset of the CAPITEL corpus in two challenges, namely: 

(1) Named Entity Recognition and Classification and 

(2) Universal Dependency Parsing.

Because of the ever-evolving nature of the NLP field and its associated shared
task competitions, we deem it relevant to propose new challenges for the
Spanish language to determine whether recent developments can push the
boundaries of the current state of the art.


Call for Participation: 

Sub-task 1: Named Entity Recognition and Classification in Spanish News
Articles: 

Information extraction tasks, formalized in the late 1980s, are designed to
evaluate systems which capture pieces of information present in free text,
with the goal of enabling better and faster information and content access.
One important set of such information are named entities (NE) which, roughly
speaking, are textual elements corresponding to names of people, places,
organizations and others. Three processes can be applied to NEs: recognition
(or identification), categorization (assigning a type according to a
predefined set of semantic categories), and linking (disambiguating the
reference). 
The aim of this sub-task is to challenge participants to apply their systems
or solutions to the problem of identifying and classifying NEs in Spanish news
articles. This two-stage process is referred to as NERC (Named Entity
Recognition and Classification).

Sub-task 2: Universal Dependency Parsing of Spanish News Articles

Dependency-based syntactic parsing has become popular in NLP in recent years.
One of the reasons for this popularity is the transparent encoding of
predicate-argument structures, which is useful in many downstream
applications. Another reason is that it is better suited than phrase-structure
grammars for languages with free or flexible word order.
Universal Dependencies (UD) is a framework for consistent annotation of
grammar (parts of speech, morphological features and syntactic dependencies)
across different human languages. Moreover, the UD initiative is an open
community effort with over 200 contributors which has produced more than 100
treebanks in over 70 languages. 
The aim of this sub-task is to challenge participants to apply their systems
or solutions to the problem of Universal Dependency parsing of Spanish news
articles as defined in the Annotation Guidelines for the CAPITEL corpus that
will be shared with the participants.

Please fill out the form on Codalab to register and submit results for NERC
(https://competitions.codalab.org/competitions/23011) or UD Parsing
(https://competitions.codalab.org/competitions/23178).

Important Dates: 
March 15: Sample set, Evaluation script and Annotation Guidelines released.
March 17: Training set released.
April 1: Development set released.
April 29: Test set released (includes background set).
May 17: Systems output submissions.
May 21: Results posted and Test set with GS annotations released.
May 31: Working notes paper submission.
June 15: Notification of acceptance (peer-reviews).
June 30: Camera ready paper submission.
September: IberLEF 2020 Workshop.

Organizing Committee: 
David Pérez Fernández, PlanTL - Ministry of Economy, Spain.
Jordi Porta-Zamorano, Centro de Estudios de la RAE, Spain.
José-Luis Sancho-Sánchez, Centro de Estudios de la RAE, Spain.
Rafael-J. Ureña-Ruiz, Centro de Estudios de la RAE, Spain.
Doaa Samy, Instituto de Ingeniería del Conocimiento (PlanTL-GTO), Spain.
Luis Espinosa-Anke, School of Computer Science and Informatics, Cardiff
University, UK. 

Contact: Jordi Porta-Zamorano (porta at rae.es)

Organizers mailing list: capitel2020org at googlegroups.com 

Task-specific mailing lists:  
capitel2020nerc at googlegroups.com
capitel2020ud at googlegroups.com




------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2019 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
               https://iufoundation.fundly.com/the-linguist-list-2019

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/
 


----------------------------------------------------------
LINGUIST List: Vol-31-1243	
----------------------------------------------------------






More information about the LINGUIST mailing list