No subject

owner-corpora at lists.uib.no owner-corpora at lists.uib.no
Wed Jul 12 09:47:57 UTC 2000


(136.187.64.136)
  by newns.op.nii.ac.jp with SMTP; 8 Jul 2000 12:42:33 +0900
From: "Noriko Kando" <Noriko.Kando at nii.ac.jp>
To: <ir at mailbase.ac.uk>, <nancy at cni.org>, <asis-l at asis.org>,
   <midas-l at Glue.umd.edu>
Cc: <LINGUIST at LINGUIST.LDC.UPENN.EDU>, <corpora at uib.no>
Subject: Corpora: 2nd CFP; NTCIR; Details of Chinese IR & Summarization + Deadline
extension
Date: Sat, 8 Jul 2000 12:47:27 +0900
X-MSMail-Priority: Normal
X-Priority: 3
X-Mailer: Microsoft Internet Mail 4.70.1155
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-2022-JP
Content-Transfer-Encoding: 7bit
Sender: owner-corpora at lists.uib.no
Precedence: bulk

apology if you have received multipole copies of this announcement...

===========================================================
Details of the Chinese IR & Summarization + Deadline extension!
--

                  CALL FOR PARTICIPATION
                      NTCIR Workshop 2
    Evaluation of Chinese & Japanese Text Retrieval
                 and Text Summarization

                    July 2000 - Feb 2001

http://www.rd.nacsis.ac.jp/~ntcadm/workshop/cfp2-en.html
             enquiries: ntcadm at rd.nacsis.ac.jp

                 Online application is available at:
http://www.rd.nacsis.ac.jp/~ntcadm/workshop/application2/app2-en.html
============================================================

An evaluation workshop in Chinese and Japanese text retrieval and
text summarization will be held from July 2000 to February 2001.
Participation is invited from anyone interested in Chinese and/or
Japanese text retrieval and English-Chinese and English-Japanese
cross-lingual information retrieval from large-scale collections and
text summarization of Japanese texts.


WORKSHOP OBJECTIVES

- To encourage research in information retrieval, cross-lingual
  information retrieval and text summarization by providing reusable
  test collections.
- To provide a forum for research groups interested in comparing
  results and exchanging ideas or opinions in an informal atmosphere.
- To improve the quality of the test collections based on the
  feedback from participants.


DESCRIPTION OF THE COLLECTION (DATA)

- CHINESE IR TASKS (Chinese and English-Chinese IR)
  - The training set and the testing set of Chinese Text Retrieval
     Tasks are selected from the Chinese Information Retrieval
     Benchmark 1 (CIRB-1).
  - The CIRB-1 consists of three parts: 1) Document Set;
     2) Topic Set; and 3) Relevance Judgment.
  - Now, the Document Set contains 132,173 news articles from
     5 news agencies in Taiwan, the Topic Set contains 50 topics
     in a form of user's information need from briefs to details, and
     the Relevance Judgment consists of the related documents
     to the various topics.

- JAPANESE & ENGLISH IR TASKS (Japanese, English and
   English-Japanese IR)
  - Training set: NTCIR-1 CD, more than 330,000 author abstracts
     of conference papers; more than half are Japanese-English
     paired (document alignments); alignments are known and
     usable for training;
  - Test set: NTCIR-1 and NTCIR-2.
     NTCIR-2 consists of two document subfiles;
    (1) ca.300,000 extended summaries of the research reports;
        about 25% are Japanese-English paired.
    (2) ca.100,000 author abstracts of conference papers;
        more than half are Japanese-English paired; the
        alignments are not announced before result submission
  - Segmented Japanese texts are also available for both
    Japanese documents and topics in the NTCIR-1&2; use of
    this data is NOT mandatory.

- TEXT SUMMARIZATION TASK
  - We use newspaper articles (Mainichi Shinbun, The Mainichi:
     you need to pay for a license to use) as original texts for these
     tasks. They are not limited to business domain, and articles of
     other domains such as editorials, columns will be included.


WORKSHOP SCHEDULE

- By July 15, 2000: Submit application. (Chinese IR, Japanese IR)
- By July 20, 2000: Submit application. (Text Summarization)
         - NTCIR-1 are available for Japanese IR participants.

- August 10, 2000: NTCIR-2 CD (new documents and fifty topics)
    will be distributed to the participatns of Japanese IR Tasks.
- August 31, 2000: CIRB-1-CH CD (132,172 documents and 50 Chinese
   topics) will be distributed to the participants of Chinese IR Task, and
   CIRB-1-EN CD  (132,172 documents and 50 English topics) will be
   distributed to the participants of English-Chinese IR Task.

- September 18, 2000: Search results submission (Japanese IR)
- September 30, 2000: Search results submission (Chinese IR)

- September, 2000: Dryrun (Text Summarization)
- November or December, 2000 : Evaluation (Text Summarization)

- January 10, 2001: Results of Relevance Assessments will be
      distributed to the participants (Chinese IR, Japanese IR)

- February 1, 2001: Papers for the working-note proceedings
       submission. (All Tasks)
- February 21-23, 2001: Workshop meeting at NII, Tokyo, Japan.
      Day 1: Open to public, Days 2-3: Active participants only

- March 1, 2001: Camera-ready copies for the proceedings.


TASK DESCRIPTION

Below, is a brief summary of the tasks envisaged for the Workshop. A
participant will conduct one or more of the tasks or subtasks below.
Participation in only one subtask (for example Japanese monolingual IR
(J-J task)) is available:

- Chinese Information Retrieval Tasks: The Chinese IR Task is to
 assess the capability of participating systems in retrieving Chinese
 documents using Chinese queries.  The English-Chinese IR Task is
 to assess the capability of participating systems in retrieving
 Chinese documents using English queries.  Chinese texts, which are
 composed of characters without explicit word boundary, make the
 retrieval task more challengeable than English ones.  The participating
 systems can employ any approaches.  Either word-based or character-
 based systems are acceptable.  The organizer will not provide any
 segmentation tools and Chinese dictionaries.

- Japanese & English Information Retrieval Task: Japanese and/or
  English monolingual IR; cross-lingual IR of single language document
  and mixed-language documents of English and Japanese by Japanese
  and/or English topics; to investigate the search effectiveness of
  systems that search a static set of documents

- Text Summarization task (Japanese description only): automatic
  text summarization of Japanese texts; the aim is (1) to collect
  qualified text data for summarization in Japanese. we will have
  newspaper articles summarized by hand, and make them available
  for research purpose use, (2) to evaluate text summarization systems;
  an extrinsic evaluation, task based evaluation. For details please visit
  http://galaga.jaist.ac.jp:8000/tsc/


TYPES OF PATICIPATION

- A. FULL: Submit retrieval results and describe the system. The
     correspondence between the group name and the group ID will
     be announced.
- B. ANONYMOUS: Submit retrieval results. The details of the system
     may not be reported. The correspondence between the group name
     and the group ID is not announced. This category is mainly for
     the participants from the companies who have troubles to report
     the details.

The list of the participating groups is made public but the evaluation
results will be announced using the group IDs only. Whichever of the
types of participation, every participating group must submit (1) a
paper for the workshop proceedings, (2) a system description form
which describes your system, and (3) bibliographic references and a
copy of all your papers using NTCIR test collections.


APPLICATIONS

Online application is available at:
http://www.rd.nacsis.ac.jp/~ntcadm/workshop/application2/app2-en.html

For the text version of application form, please complete and return
it via e-mail, fax, or postal mail to;

     ATTN: Noriko Kando
     NTCIR Project
     National Institute of Informatics (NII)
     2-1-2 Hitotsubashi, Chiyoda-ku,Tokyo 101-8430, Japan
     email: ntcadm at rd.nacsis.ac.jp
     fax: +81-3-3556-1916 phone: +81-3-4212-2529


TRAVEL SUPPORT

Financial support to attend the NTCIR Workshop meeting will be
available for the limited number of active oversea participants who
will present material at the workshop meeting in February, 2001, and
who are not receiving other funding to attend the NTCIR Workshop
meeting. Priority will be given to younger researchers. The detail
will be announced later.


ENQUIRIES

- Please send email to Noriko Kando, project manager, at
  kando at nii.ac.jp, or to NTCIR Project administrators (
  ntcadm at rd.nacsis.ac.jp).
- About "Chinese IR Task", please send email to the Task Chairs,
  Hsin-Hsi Chen (hh_chen at csie.ntu.edu.twi) or Kuang-Hua Chen
  (khchen at ccms.ntu.edu.tw ).
- About "Text Summarization Task", please send email to the Task
  Chairs, Manabu Okumura (oku at pi.titech.ac.jp) or Takahiro Fukushima
  (fukusima at res.otemon.ac.jp).


NOTES

- The first day of the Workshop meeting will be open forum of the
  researchers who are interested in the topics. The second and third
  days will be open only to the active participating groups that have
  submited results and selected people from organizing agencies.
- The proceedings will be published online as well as printed-form.
- Dissemination of the research results using the NTCIR collections
  other than in the Workshop's Proceedings is welcome. However, the
  conditions of participation preclude specific advertising claims
  based  on the results using the Collection or the Workshop.
- International participants are welcome. Announcements will be in
  English and Japanese, and in English for Chinese IR Task.
- The official language for the proceedings papers and presentation
  at the Workshop meeting in February, 2001 is English.

- An evaluation of Korean text retrieval is organized by Prof Sung
  Hyon Myaeng, Korea (shmyaeng at chungnam.ac.kr). We keep close
  relationship each other.

For more information, please visit;
http://www.rd.nacsis.ac.jp/~ntcadm/workshop/cfp2-en.html
============================================================



More information about the Corpora mailing list