Corpora: CFP NTCIR WS3 (200/2001): Evaluation of IR, QA, Summarization

Noriko Kando kando at nii.ac.jp
Tue Sep 25 07:25:34 UTC 2001


... apology for duplicated post.

The application deadline is reaching. Please register now!
 ==========================================================================

                           CALL FOR PARTICIPATION
                    The Third NTCIR Workshop (2001/2002)
        Evaluation of Information Retrieval, Q&A, and Summarization
                       September 2001 - October 2002

               Meeting: October 8-10, 2002, NII, Tokyo Japan
               URL: http://research.nii.ac.jp/ntcir/workshop/
                        enquiries: ntcadm at nii.ac.jp
 ===========================================================================

An evaluation workshop of Asian language text retrieval, Q&A, and text
summarization will be held from September 2001 to October, 2002.
Participation is invited from anyone interested in retrieval of various
kind of text and cross-lingual information retrieval of Asian languages
from large-scale collections, and Q&A and text summarization of Japanese

texts.
   This year we picked five areas of research as task, Cross Language
Retrieval, Patent Retrieval, Question Answering, Automatic Text
Summarization, and Web Retrieval. An optional task is available in
Patent Retrieval and Web Retrieval Tasks. Any proposal using the data
provided are welcome for the optional task and we hope it will provide
an exploratory occasion for new tasks.

WORKSHOP OBJECTIVES
   * To encourage research in information retrieval, Q&A, and text
     summarization by providing reusable test collections.
   * To provide a forum for research groups interested in comparing
     results and exchanging ideas or opinions in an informal atmosphere
   * To improve the quality of the test collections based on the
     feedback from participants.


TASK DESCRIPTION
   Below is a brief summary of the tasks envisaged for the Workshop.
A participant will conduct one or more of the tasks or subtasks below.
Participation in only one subtask (for example Japanese monolingual IR
(J-J) in the CLIR Task) is available:

1. Cross Language Retrieval Task (clir)
Documents and topics are in four languages (Chinese, Korean, Japanese
and English)
   * Multilingual CLIR (MLIR): Search document collection more than one
     languages by one of four languages of topics.excepting Korean
     documents.
   * Bilingual CLIR (BLIR): Search of any two different languages as
     language and documents, excepting search of English documents
   * Single Lanugage IR (SLIR): Monolingual Search of Chinese, Korea, or

     Japanese.
DOCUMENT: newspapers publish in Asia:
- Chinese: CIRB010, United Daily News (1998-1999)
- Korean: Korea Economic Daily (1994)
- Japanese: Mainichi Newspaper (1998-1999)*
- English: Taiwan News and China English News (1998-1999),
   Mainichi Daily News (1998-1999)*

2. Patent Retrieval Task (patent)
   * Main Task
        o Cross-language Cross-DB retrieval: retrieve patents in
          response to J/E/C newspaper articles associated with
          technology and commercial products.
        o Monolingual Associative Retrieval: retrieve patents associated

          with an input Japanese patent
   * Optional task: Any research reports are invited on patent
     processing using the above data, including, but not limited to:
     generating patent maps, paraphrasing claims, aligning claims
     and examples, summarization for patents, clustering patents.
DOCUMENT: - Japanese patents: 1998-1999 (about 17GB)
- Japio patent abstracts: 1995-1999
- Patent Abstracts of Japan (English translations for
   Japio patent abstracts): 1995-1999
- Patolis test collection (34 topics and relevance assessment)
- Newspaper articles (Japanese/English/Traditional Chinese)

3. Question Answering Task (qac)
   * Task 1: System extracts five answers from the documents in some
     order. 100 questions. System is required to return support
     information for each answer of the questions. We assume
     the support information as a paragraph, 100 letter passage or
     document which includes the answer.
   * Task 2: System extracts only one answer from the documents. 100
     questions. Support information is required.
   * Task 3: evaluation of a series of questions. The related questions
     are given for the 30 of questions of Task 2.
DOCUMENT: Japanese newspaper articles (Mainichi Newspaper 1998-1999)*

4. Automatic Text Summarization Task (tsc2)
   * Task A (single document summarization): Given the texts to be
     summarized and summarization lengthes, the participants submit
     summaries for each text in plain text format.
   * Task B (multi-document summarization): Given a set of texts, the
     participants produce summaries of it in plain text format. The
     information which was used to produce the document set, such as
     queries, as well as summarization lengthes are given to the
     participants.
DOCUMENT: Japanese newspaper articles (Mainichi Newspaper 1998-1999)*

5. Web Retrieval Task
   * A. Survey Retrieval: Survey retrieval is similar to the
     traditional Ad-hoc retrieval for scientific documents or
     newspapers, where the system performs searching using newly
     provided topics for a static document set. Both recall and
     precision are evenly weighted for the evaluation. Two
     types of subtasks are provided: the retrieval using the
     topics in the almost same format of the past NTCIR workshops
     ('A1. Topic Retrieval') and the one using relevant documents
     given ('A2. Similarity Retrieval'). The page is the basic
     unit for evaluation, however, evidential passages can be
     used for complementary evaluation. Here the evidential
     passages means a part of each relevant document that gives
     the evidence of relevance judgment, and the submission
     of them is not mandatory.

   * B. Target Retrieval: Target retrieval is to try to evaluate
     the effectiveness of the retrieval in the case the user
     requires just one answer or a few (e.g. a fact-type
     retrieval, a reteieval of a site top page), where precision
     should be emphasized. The runs will be submitted as the
     ranked top 10 documents retrieved for each topic, being
     attached with evidential passages (not mandatory). Several
     evaluation measures will be applied.

   * C. Optional Tasks: The participants can freely subscribe
     proposals using the document set used in sub-task A and B,
     according to their own research interests. The results are
     presented as the paper/poster in the NTCIR-3 workshop
     meeting. If the proposal can involve several participants,
     it can be adopted as a sub-task and investigated in the
     details. 'C1. Search results classification' and 'C2.
     Speech-Driven Retrieval' are examples of the optinal
     tasks.A. Survey Retrieval (both recall and precision are
     evaluated)
DOCUMENT: Web documents mainly collected from jp domain (ca.100GB &
          ca.10GB) Available at the "Open-Lab" in the NII


WORKSHOP SCHEDULE
2001-09-30      Application Due
2001-10-01      Document release (newspaper)
2001-10/2002-01 Dry Run and Round-Table Discussion
                     (depends on each task)
2001-12         Open Lab start
2001-12/2002-03 Formal Run (depends on each task)
2002-07-01      Evaluation Results Delivery
2002-08-20      Paper for Working Note Due
2002-10-08/10   NCIR Workshop 3 Meeting
             Days 1-2: Closed session (task participants only)
             Day 3: Open session
2002-12-01      Paper for Final Proceedings Due


TYPES OF PATICIPATION
   * A. FULL: Submit results and describe the system. The
     correspondence between the group name and the group ID will
     be announced.
   * B. ANONYMOUS: Submit results. The details of the system may not be
     reported. The correspondence between the group name and the group
     ID is not announced. This category is mainly for the participants
     from the companies who have troubles to report the details.

The list of the participating groups will be made public although the
evaluation results will be announced using the group IDs only. Whichever

of the types of participation, every participating group must submit
(1) paper(s) for the workshop proceedings, (2) a system description
form which describes your system, and (3) bibliographic references and
a copy of all your papers when you will publish a paper using NTCIR
test collections.


APPLICATIONS
Online application;
http://research.nii.ac.jp/ntcir/workshop/application-en.html


ENQUIRIES
   * Please send email to Noriko Kando, program chair or to
     NTCIR Project administrators (ntcadm at nii.ac.jp).
   * For the details of a specific task, please contact each task's
     chair and organizers.


NEW FEATURES of NTCIR WS3 TASKS
     * Two Types of CLIR
       (1) Multilingual CLIR of Asian Languages and English (CLIR)
       (2) CLIR of Technical Information: Search Japanese Patent
           documents by English/Chinese/Japanese topics.
           English-Japanese paired abstracts (ca. 1,500,000 docs)
            are included in the test collection used for NTCIR WS3.
     * Optional Tasks (Patent & Web): any research groups who are
        interested in the research using the document collection
        provided in these tasks for any research projects are invited!
        Also we expect that this venture will explore the new
        possible tasks for the future NTCIR workshop.
     * Search by Documents (Patent & Web)
     * Passage Retrieval (Patent, QA & Web)
     * Precision-oriented Evaluation (QA & Web) and Multigrade Relevance

        Judgments (CLIR, Patent & Web)


NOTES
   * The proceedings will be published online as well as printed-form.
   * Dissemination of the research results using the NTCIR collections
     other than in the Workshop's Proceedings is welcome. However, the
     conditions of participation preclude specific advertising claims
     based on the results using the Collection or the Workshop.
   * International participants are welcome. Announcements will be in
     English and Japanese.
   * The official language for the proceedings papers and presentation
     at the Workshop meeting in October, 2002 is English.
   * Documents will be provided for the participants those who returned
     required user agreement forms.
   * DOCUMENT USAGE: The period of permitted use of Mainichi Newspapers
     and Mainichi Daily News are from 2001-09-01 to 2003-09-30. For
     active participants who submit the results and who affiliated at
     the  organization outside Japan will be able to extend the period
     up to 2008-09-30. After the permitted period will be terminated,
     the participants will have to delete all the document data. Those
     who want to use the data after the period can purchase the data
     from Mainichi Newspaper Co., and obtain the permission for
     research purpose use from the company. The permitted period
     may vary according to each task.
-----------------------------------------------------------------------------

Noriko Kando.
ntcir project



More information about the Corpora mailing list