16.1534, FYI: International Chinese Word Segmentation Bakeoff

LINGUIST List linguist at linguistlist.org
Fri May 13 13:32:43 UTC 2005


LINGUIST List: Vol-16-1534. Fri May 13 2005. ISSN: 1068 - 4875.

Subject: 16.1534, FYI: International Chinese Word Segmentation Bakeoff

Moderators: Anthony Aristar, Wayne State U <aristar at linguistlist.org>
            Helen Aristar-Dry, Eastern Michigan U <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org)
        Sheila Dooley, U of Arizona
        Terry Langendoen, U of Arizona

Homepage: http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Ann Sawyer <sawyer at linguistlist.org>
================================================================

To post to LINGUIST, use our convenient web form at
http://linguistlist.org/LL/posttolinguist.html.


===========================Directory==============================

1)
Date: 10-May-2005
From: Tom Emerson < tree at basistech.com >
Subject: 2nd International Chinese Word Segmentation Bakeoff

	
-------------------------Message 1 ----------------------------------
Date: Fri, 13 May 2005 09:28:32
From: Tom Emerson < tree at basistech.com >
Subject: 2nd International Chinese Word Segmentation Bakeoff


The Second International Chinese Word Segmentation Bakeoff
Preliminary Description and Important Dates

1. Introduction

This is the initial announcement for the Second International Chinese Word
Segmentation Bakeoff, sponsored by the Special Interest Group for Chinese
Language Processing (SIGHAN) of the Association for Computational
Linguistics. The bakeoff will occur over the summer of 2005 and the results
will be presented at the 4th SIGHAN Workshop, to be held at The Second
International Joint Conference on Natural Language Processing (IJCNLP'05),
October 14-15.

The first bakeoff, held in 2003 and presented at the 2nd SIGHAN Workshop at
ACL 2003 in Sapporo, has become the pre-eminent measure for Chinese word
segmentation evaluation and has been cited in numerous papers. As with the
first evaluation, the second bakeoff will concentrate exclusively on Word
Segmentation. Corpora from the following organizations will be available
for use:

- CKIP, Academia Sinica, Taiwan
- City University of Hong Kong, Hong Kong SAR
- CIS Department, University of Pennsylvania, United States
- Beijing Universty, China
- Microsoft Research, China

The exact nature of the segmentation tasks is being discussed and final
details will be made available when registration is opened on 1 June 2005.

Participants are required to submit a short paper describing their system
and analyzing their performance, and present a summary at the workshop. The
reports will be published in the SIGHAN workshop proceedings.

The language of the workshop is English. Papers must be submitted and
presented in English. Note that unlike the workshop proper, there will not
be a peer review process on the bakeoff reports.

2. Important Dates

2005-06-01            Registration Open
2005-06-29            Training data made available
2005-07-27            Testing data made available
2005-07-29            Test results sent back to organizers
2005-08-05            Results privately reported to participants
2005-08-19            Final reports due from participants

3. Contact Information

The workshop is being organized by Tom Emerson of Basis Technology Corp.
and Jianfeng GAO of Microsoft Research China.

The web page for the competition is:

http://www.sighan.org/bakeoff2005/

Questions on the bakeoff should be addressed to Tom Emerson,
tree at basistech.com.



Linguistic Field(s): Cognitive Science
                     Text/Corpus Linguistics
                      Computational Linguistics





-----------------------------------------------------------
LINGUIST List: Vol-16-1534	

	



More information about the LINGUIST mailing list