Corpora: CFP NTCIR WS3 (200/2001): Evaluation of IR, QA, Summarization
Noriko Kando
kando at nii.ac.jp
Tue Sep 25 07:25:34 UTC 2001
... apology for duplicated post.
The application deadline is reaching. Please register now!
==========================================================================
CALL FOR PARTICIPATION
The Third NTCIR Workshop (2001/2002)
Evaluation of Information Retrieval, Q&A, and Summarization
September 2001 - October 2002
Meeting: October 8-10, 2002, NII, Tokyo Japan
URL: http://research.nii.ac.jp/ntcir/workshop/
enquiries: ntcadm at nii.ac.jp
===========================================================================
An evaluation workshop of Asian language text retrieval, Q&A, and text
summarization will be held from September 2001 to October, 2002.
Participation is invited from anyone interested in retrieval of various
kind of text and cross-lingual information retrieval of Asian languages
from large-scale collections, and Q&A and text summarization of Japanese
texts.
This year we picked five areas of research as task, Cross Language
Retrieval, Patent Retrieval, Question Answering, Automatic Text
Summarization, and Web Retrieval. An optional task is available in
Patent Retrieval and Web Retrieval Tasks. Any proposal using the data
provided are welcome for the optional task and we hope it will provide
an exploratory occasion for new tasks.
WORKSHOP OBJECTIVES
* To encourage research in information retrieval, Q&A, and text
summarization by providing reusable test collections.
* To provide a forum for research groups interested in comparing
results and exchanging ideas or opinions in an informal atmosphere
* To improve the quality of the test collections based on the
feedback from participants.
TASK DESCRIPTION
Below is a brief summary of the tasks envisaged for the Workshop.
A participant will conduct one or more of the tasks or subtasks below.
Participation in only one subtask (for example Japanese monolingual IR
(J-J) in the CLIR Task) is available:
1. Cross Language Retrieval Task (clir)
Documents and topics are in four languages (Chinese, Korean, Japanese
and English)
* Multilingual CLIR (MLIR): Search document collection more than one
languages by one of four languages of topics.excepting Korean
documents.
* Bilingual CLIR (BLIR): Search of any two different languages as
language and documents, excepting search of English documents
* Single Lanugage IR (SLIR): Monolingual Search of Chinese, Korea, or
Japanese.
DOCUMENT: newspapers publish in Asia:
- Chinese: CIRB010, United Daily News (1998-1999)
- Korean: Korea Economic Daily (1994)
- Japanese: Mainichi Newspaper (1998-1999)*
- English: Taiwan News and China English News (1998-1999),
Mainichi Daily News (1998-1999)*
2. Patent Retrieval Task (patent)
* Main Task
o Cross-language Cross-DB retrieval: retrieve patents in
response to J/E/C newspaper articles associated with
technology and commercial products.
o Monolingual Associative Retrieval: retrieve patents associated
with an input Japanese patent
* Optional task: Any research reports are invited on patent
processing using the above data, including, but not limited to:
generating patent maps, paraphrasing claims, aligning claims
and examples, summarization for patents, clustering patents.
DOCUMENT: - Japanese patents: 1998-1999 (about 17GB)
- Japio patent abstracts: 1995-1999
- Patent Abstracts of Japan (English translations for
Japio patent abstracts): 1995-1999
- Patolis test collection (34 topics and relevance assessment)
- Newspaper articles (Japanese/English/Traditional Chinese)
3. Question Answering Task (qac)
* Task 1: System extracts five answers from the documents in some
order. 100 questions. System is required to return support
information for each answer of the questions. We assume
the support information as a paragraph, 100 letter passage or
document which includes the answer.
* Task 2: System extracts only one answer from the documents. 100
questions. Support information is required.
* Task 3: evaluation of a series of questions. The related questions
are given for the 30 of questions of Task 2.
DOCUMENT: Japanese newspaper articles (Mainichi Newspaper 1998-1999)*
4. Automatic Text Summarization Task (tsc2)
* Task A (single document summarization): Given the texts to be
summarized and summarization lengthes, the participants submit
summaries for each text in plain text format.
* Task B (multi-document summarization): Given a set of texts, the
participants produce summaries of it in plain text format. The
information which was used to produce the document set, such as
queries, as well as summarization lengthes are given to the
participants.
DOCUMENT: Japanese newspaper articles (Mainichi Newspaper 1998-1999)*
5. Web Retrieval Task
* A. Survey Retrieval: Survey retrieval is similar to the
traditional Ad-hoc retrieval for scientific documents or
newspapers, where the system performs searching using newly
provided topics for a static document set. Both recall and
precision are evenly weighted for the evaluation. Two
types of subtasks are provided: the retrieval using the
topics in the almost same format of the past NTCIR workshops
('A1. Topic Retrieval') and the one using relevant documents
given ('A2. Similarity Retrieval'). The page is the basic
unit for evaluation, however, evidential passages can be
used for complementary evaluation. Here the evidential
passages means a part of each relevant document that gives
the evidence of relevance judgment, and the submission
of them is not mandatory.
* B. Target Retrieval: Target retrieval is to try to evaluate
the effectiveness of the retrieval in the case the user
requires just one answer or a few (e.g. a fact-type
retrieval, a reteieval of a site top page), where precision
should be emphasized. The runs will be submitted as the
ranked top 10 documents retrieved for each topic, being
attached with evidential passages (not mandatory). Several
evaluation measures will be applied.
* C. Optional Tasks: The participants can freely subscribe
proposals using the document set used in sub-task A and B,
according to their own research interests. The results are
presented as the paper/poster in the NTCIR-3 workshop
meeting. If the proposal can involve several participants,
it can be adopted as a sub-task and investigated in the
details. 'C1. Search results classification' and 'C2.
Speech-Driven Retrieval' are examples of the optinal
tasks.A. Survey Retrieval (both recall and precision are
evaluated)
DOCUMENT: Web documents mainly collected from jp domain (ca.100GB &
ca.10GB) Available at the "Open-Lab" in the NII
WORKSHOP SCHEDULE
2001-09-30 Application Due
2001-10-01 Document release (newspaper)
2001-10/2002-01 Dry Run and Round-Table Discussion
(depends on each task)
2001-12 Open Lab start
2001-12/2002-03 Formal Run (depends on each task)
2002-07-01 Evaluation Results Delivery
2002-08-20 Paper for Working Note Due
2002-10-08/10 NCIR Workshop 3 Meeting
Days 1-2: Closed session (task participants only)
Day 3: Open session
2002-12-01 Paper for Final Proceedings Due
TYPES OF PATICIPATION
* A. FULL: Submit results and describe the system. The
correspondence between the group name and the group ID will
be announced.
* B. ANONYMOUS: Submit results. The details of the system may not be
reported. The correspondence between the group name and the group
ID is not announced. This category is mainly for the participants
from the companies who have troubles to report the details.
The list of the participating groups will be made public although the
evaluation results will be announced using the group IDs only. Whichever
of the types of participation, every participating group must submit
(1) paper(s) for the workshop proceedings, (2) a system description
form which describes your system, and (3) bibliographic references and
a copy of all your papers when you will publish a paper using NTCIR
test collections.
APPLICATIONS
Online application;
http://research.nii.ac.jp/ntcir/workshop/application-en.html
ENQUIRIES
* Please send email to Noriko Kando, program chair or to
NTCIR Project administrators (ntcadm at nii.ac.jp).
* For the details of a specific task, please contact each task's
chair and organizers.
NEW FEATURES of NTCIR WS3 TASKS
* Two Types of CLIR
(1) Multilingual CLIR of Asian Languages and English (CLIR)
(2) CLIR of Technical Information: Search Japanese Patent
documents by English/Chinese/Japanese topics.
English-Japanese paired abstracts (ca. 1,500,000 docs)
are included in the test collection used for NTCIR WS3.
* Optional Tasks (Patent & Web): any research groups who are
interested in the research using the document collection
provided in these tasks for any research projects are invited!
Also we expect that this venture will explore the new
possible tasks for the future NTCIR workshop.
* Search by Documents (Patent & Web)
* Passage Retrieval (Patent, QA & Web)
* Precision-oriented Evaluation (QA & Web) and Multigrade Relevance
Judgments (CLIR, Patent & Web)
NOTES
* The proceedings will be published online as well as printed-form.
* Dissemination of the research results using the NTCIR collections
other than in the Workshop's Proceedings is welcome. However, the
conditions of participation preclude specific advertising claims
based on the results using the Collection or the Workshop.
* International participants are welcome. Announcements will be in
English and Japanese.
* The official language for the proceedings papers and presentation
at the Workshop meeting in October, 2002 is English.
* Documents will be provided for the participants those who returned
required user agreement forms.
* DOCUMENT USAGE: The period of permitted use of Mainichi Newspapers
and Mainichi Daily News are from 2001-09-01 to 2003-09-30. For
active participants who submit the results and who affiliated at
the organization outside Japan will be able to extend the period
up to 2008-09-30. After the permitted period will be terminated,
the participants will have to delete all the document data. Those
who want to use the data after the period can purchase the data
from Mainichi Newspaper Co., and obtain the permission for
research purpose use from the company. The permitted period
may vary according to each task.
-----------------------------------------------------------------------------
Noriko Kando.
ntcir project
More information about the Corpora
mailing list