32.3722, FYI: Second Call for Participation: SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition

Tue Nov 30 08:05:29 UTC 2021

LINGUIST List: Vol-32-3722. Tue Nov 30 2021. ISSN: 1069 - 4875.

Subject: 32.3722, FYI: Second Call for Participation: SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition

Moderator: Malgorzata E. Cavar (linguist at linguistlist.org)
Student Moderator: Jeremy Coburn, Lauren Perkins
Managing Editor: Becca Morris
Team: Helen Aristar-Dry, Everett Green, Sarah Robinson, Nils Hjortnaes, Joshua Sims, Billy Dickson
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Please support the LL editors and operation with a donation at:
           https://funddrive.linguistlist.org/donate/

Editor for this issue: Everett Green <everett at linguistlist.org>
================================================================

Date: Tue, 30 Nov 2021 03:04:39
From: Sudipta Kar [sudipkar at amazon.com]
Subject: Second Call for Participation: SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition

We invite you to participate in SemEval-2022 Task 11: *Multi*lingual *Co*mplex
*N*amed *E*ntity *R*ecognition (MultiCoNER).

*Task Website:* https://multiconer.github.io/
*Codalab (Data download + Submission):* 
https://competitions.codalab.org/competitions/36044

This task focuses on the detection of complex entities, such as movie, book,
music and product titles, in low context settings (short and uncased text).

The task covers 3 domains (sentences, search queries, and questions) and
provides data in 11 languages: *English, Spanish, Dutch, Russian, Turkish,
Korean, Farsi, German, Chinese, Hindi*, and *Bangla*. Here are some examples
in English, Chinese, Bangla, Hindi, Russian, Korean, and Farsi, where entities
are enclosed inside brackets with their type:

* the original *[ferrari daytona | PRODUCT]* replica driven by *[don johnson |
PERSON]* in *[miami vice | CreativeWork]*
* 它 的 座 位 在 [*圣 布 里 厄* | *LOCATION]* .
* স্টেশনটির মালিক [*টাউনস্কেয়ার মিডিয়া* | *CORPORATION]* ।
* यह [*कनेल विभाग* | *LOCATION*] की राजधानी है।
* в основе фильма — стихотворение [*г. сапгира* | *PERSON]* .
* [*블루레이 디스크* | *PRODUCT]* : 광 기록 방식 저장매체의 하나
* [*نینتندو* | *CORPORATION]* / [*باندای نامکو انترتینمنت* | *CORPORATION]* –
[*برادران سوپر ماریو نهایی* | *CreativeWork]*

Additionally, a *multilingual NER track* is also offered for multilingual
systems that can process all languages. A *code-mixed track* allows
participants to build systems that process inputs with tokens coming from two
languages. For example, the following are some  code-mixed examples from
Turkish, Spanish, Dutch, German, and English.

* it was produced at the [*soyuzmultfilm* | *GROUP]* studio in [*moskova* |
*LOCATION]* .
* [*arturo vidal* | *PERSON]* ( born 1987 ) , professional footballer playing
for [*fútbol club barcelona* | *GROUP]*
* daarmee promoveerde hij toen naar de [*premier league* | *CORPORATION]* .
* piracy has been a part of the [*sultanat von sulu* | *LOCATION]* culture .

The task focuses on detecting semantically ambiguous and complex entities in
short and low-context settings. Participants are welcome to build NER systems
for any number of languages. And we encourage to aim for a bigger challenge of
building NER systems for multiple languages. The task also aims at testing the
domain adaption capability of the systems by testing on additional test sets
on questions and short search queries.  

We have released training data for 11 languages along with a baseline system
to start with. Participants can submit their system for one language but are
encouraged to aim for a bigger challenge and build multi-lingual NER systems.

*Task Website:* https://multiconer.github.io/
*Codalab Submission site:* 
https://competitions.codalab.org/competitions/36044
*Mailing List:* multiconer-semeval at googlegroups.com
*Slack Workspace:*
https://join.slack.com/t/multiconer/shared_invite/zt-vi3g97cx-MpqTvS07XX22S78n
RC2s0Q
*Baseline System:* https://multiconer.github.io/baseline

*Shared task schedule:*

* Training data ready: September 3, 2021
* Evaluation data ready: December 3, 2021
* Evaluation start: January 10, 2022
* Evaluation end: by January 31, 2022 (latest date; task organizers may choose
an earlier date)
* System description paper submissions due: February 23, 2022
* Notification to authors: March 31, 2022

*Task organizers*

* Shervin Malmasi (Amazon)
* Besnik Fetahu (Amazon)
* Anjie Fang (Amazon) 
* Sudipta Kar (Amazon) 
* Oleg Rokhlenko (Amazon)

Please reach out to the organizers at
multiconer-semeval-organizers at googlegroups.com, or join the Slack workspace to
connect with the other participants and organizers.

Linguistic Field(s): Computational Linguistics

Subject Language(s): English (eng)

------------------------------------------------------------------------------

***************************    LINGUIST List Support    ***************************
 The 2020 Fund Drive is under way! Please visit https://funddrive.linguistlist.org
  to find out how to donate and check how your university, country or discipline
     ranks in the fund drive challenges. Or go directly to the donation site:
                   https://crowdfunding.iu.edu/the-linguist-list

                        Let's make this a short fund drive!
                Please feel free to share the link to our campaign:
                    https://funddrive.linguistlist.org/donate/

----------------------------------------------------------
LINGUIST List: Vol-32-3722	
----------------------------------------------------------