36.29, Shared Task on Ancient Chinese Named Entity Recognition (NER)

Wed Jan 8 22:05:04 UTC 2025

LINGUIST List: Vol-36-29. Wed Jan 08 2025. ISSN: 1069 - 4875.

Subject: 36.29, Shared Task on Ancient Chinese Named Entity Recognition (NER)

Moderator: Steven Moran (linguist at linguistlist.org)
Managing Editor: Justin Fuller
Team: Helen Aristar-Dry, Steven Franks, Joel Jenkins, Daniel Swanson, Erin Steitz
Jobs: jobs at linguistlist.org | Conferences: callconf at linguistlist.org | Pubs: pubs at linguistlist.org

Homepage: http://linguistlist.org

Editor for this issue: Erin Steitz <ensteitz at linguistlist.org>

================================================================

Date: 08-Jan-2025
From: Bin Li [libin.njnu at gmail.com]
Subject: Shared Task on Ancient Chinese Named Entity Recognition (NER)

Full Title: Shared Task on Ancient Chinese Named Entity Recognition
(NER)
Short Title: EvaHan2025 @ALP2025

Date: 03-May-2025 - 03-May-2025
Location: Albuquerque, New Mexico, USA
Contact Person: Bin Li
Meeting Email: libin.njnu at gmail.com
Web Site: https://github.com/GoThereGit/EvaHan

Linguistic Field(s): Computational Linguistics
Subject Language(s): Old Chinese (och)
Language Family(ies): Chinese-Tibetan-Mongolian

Call Deadline: 15-Jan-2025

Meeting Description:
Co-located with ALP2025(https://www.ancientnlp.com/alp2025/)
@NAACL2025 @Albuquerque, New Mexico
EvaHan 2025, the fourth International Evaluation of Ancient Chinese
Information Processing, is a pivotal event in the realm of
computational linguistics. It places a significant emphasis on the
named entity recognition (NER) tasks of large language models,
particularly in the context of ancient Chinese texts.
This shared task serves as a platform for global experts to converge
and exchange their latest research findings and technological
advancements in the domain of ancient Chinese information processing.
It is a unique opportunity for scholars and developers to showcase
their innovative approaches to deciphering and understanding the
complexities of ancient Chinese language data.
Data:
The Evahan 2025 data includes three datasets, encompassing historical
and medical texts, with a total of 500,000 characters. The data
underwent an initial phase of automatic annotation, followed by
meticulous corrections and refinements by experts in Ancient Chinese
language and history, ensuring the highest quality of training
material and gold-standard texts.
Dataset A is derived from Shiji (史记), an ancient Chinese historical
masterpiece by Sima Qian, chronicling China's history from mythical
times to the Han dynasty, blending biographical and annalistic styles.
This dataset contains 6 categories of entities.
Dataset B is derived from the Twenty-Four Histories (二十四史), a
comprehensive compilation of official Chinese historical records
spanning early dynasties through the Ming, documenting governance,
culture, and societal evolution. This dataset contains 3 categories of
entities.
Dataset C consists of texts on Traditional Chinese Medicine Classics
(中医药典籍), covering herbal remedies, acupuncture, and other traditional
medical practices. This dataset contains 6 categories of entities.
Participation:
To participate in EvaHan 2025, you must complete the following steps:
Registration:
Submit a registration form to officially register your team for the
task. Registration is open from December 1, 2024, to January 15, 2025.
Only registered participants will gain access to the training dataset.
Accessing the Training Data:
After completing the registration process, participants will receive
instructions for downloading the training dataset, which includes
400,000 characters from Ancient Chinese texts annotated for Named
Entity Recognition.
Submitting Results and Reports:
Participants must use the provided test data to generate results and
submit their system outputs and a technical report as per the shared
task schedule.
For inquiries or to request the registration form, please contact us
at evahan2025 at gmail.com.
The shard task webpage for details:
https://github.com/GoThereGit/EvaHan

------------------------------------------------------------------------------

********************** LINGUIST List Support ***********************
Please consider donating to the Linguist List to support the student editors:

https://www.paypal.com/donate/?hosted_button_id=87C2AXTVC4PP8

LINGUIST List is supported by the following publishers:

Cambridge University Press http://www.cambridge.org/linguistics

Cascadilla Press http://www.cascadilla.com/

Elsevier Ltd http://www.elsevier.com/linguistics

John Benjamins http://www.benjamins.com/

Language Science Press http://langsci-press.org

Multilingual Matters http://www.multilingual-matters.com/

Netherlands Graduate School of Linguistics / Landelijke (LOT) http://www.lotpublications.nl/

Wiley http://www.wiley.com

----------------------------------------------------------
LINGUIST List: Vol-36-29
----------------------------------------------------------