14.872, Calls: Document Design/Patent Corpus Processing

LINGUIST List linguist at linguistlist.org
Wed Mar 26 03:31:33 UTC 2003

LINGUIST List:  Vol-14-872. Tue Mar 25 2003. ISSN: 1068-4875.

Subject: 14.872, Calls: Document Design/Patent Corpus Processing

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Marie Klopfenstein <marie at linguistlist.org>

To give you an incentive to donate, many of our Supporting Publishers
have generously donated some amazing linguistic prizes. As a donor you
are automatically entered into this prize draw. To find out what's on
offer and the rules etc., visit:


We still have a long way to go, however, to reach our target of
$50,000. Please make a donation at:


The LINGUIST List depends on the generous contributions from
subscribers like you; we would not be able to operate without your

The moderators, staff, and student editors at LINGUIST would like to
take this opportunity to thank you for your continuous support.

As a matter of policy, LINGUIST discourages the use of abbreviations
or acronyms in conference announcements unless they are explained in
the text.

To post to LINGUIST, use our convenient web form at


Date:  Tue, 25 Mar 2003 08:49:12 +0000
From:  c.e.a.dewaele at uvt.nl
Subject:  Document Design

Date:  Tue, 25 Mar 2003 18:03:18 EST
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  Workshop on Patent Corpus Processing

-------------------------------- Message 1 -------------------------------

Date:  Tue, 25 Mar 2003 08:49:12 +0000
From:  c.e.a.dewaele at uvt.nl
Subject:  Document Design

Document Design
Short Title: Document Design

Date: 22-Jan-2004 - 24-Jan-2004
Location: Tilburg, Noord-Brabant, Netherlands
Contact: Cathy de Waele
Contact Email: document.design at uvt.nl
Meeting URL: http://let.uvt.nl/docdes

Linguistic Sub-field: General Linguistics
Call Deadline: 01-Sep-2003

Meeting Description:

The goal of the conference is to bring together researchers and
professionals within the broad field of document design, who are
working in the field of discourse studies, (cognitive) linguistics,
educational psychology, speech communication, communication science,
technical documentation, social psychology, cognitive psychology and
marketing communication.

The focus will be on the way in which document design has been the
subject of debate, research, and information supply, as evident in
Document Design and similar international journals.

Keynotespeakers are:

Saul Carliner
Konrad Ehlich
James Hartley
Theo van Leeuwen
David Sless
Kazuo Terakado Call for Papers

The organizers invite contributions to the conference in which aspects
of (electronic) discourse - written, spoken or visual - are combined
with aspects of text quality (function, institutional setting,

Methodologies used may range from experimental and (corpus) analytical
to case studies. Message variables may concern content, structure,
lay-out, audience, style, and so on.

Contributions should report on original and recent work that has not
been published previously. Only electronically submitted abstracts
will be considered.

Send an electronic abstract in English (max. 400 words) to
document.design at uvt.nl

Deadline for submission is September 1, 2003.

A committee consisting of the organizers and external referees will
evaluate the proposals. Notification of acceptance will be given by
September 1, 2003.

Afterwards, a selection of the papers will be published in the journal
Document Design.


There is also a possibility to organize a workshop or to act as a
discussion leader. If you are interested, please send an e-mail to
c.e.a.dewaele at uvt.nl

-------------------------------- Message 2 -------------------------------

Date:  Tue, 25 Mar 2003 18:03:18 EST
From:  Priscilla Rasmussen <rasmusse at cs.rutgers.edu>
Subject:  Workshop on Patent Corpus Processing

ACL 2003 Workshop on Patent Corpus Processing
12 July 2003, Sapporo, Japan


Workshop Description

The goal of this workshop is to foster research and development of the
technology for patent corpus processing, by providing a forum in which
researchers and practitioners can exchange and share their ideas,
approaches, perspectives, and experiences from their work in progress.

The processing of intellectual property (IP) documents, including
patents, is important in the scientific, business, and law
communities. Much of the focus for patent and IP processing has been
in the database and information retrieval communities, but not in the
computational linguistics (CL) and natural language processing (NLP)

In 2000, the first ACM SIGIR 2000 Workshop on Patent Retrieval was
held. In this workshop, patent retrieval systems in use at EPO
(European Patent Office) and JAPIO (Japanese Patent Information
Organization) were introduced, and a number of issues related to
patent retrieval (e.g., producing ontologies, cross-language
retrieval, and evaluation methods) were proposed/discussed.

In 2001-2002, the NTCIR workshop (the National Institute of
Informatics, Japan), which is a TREC-style evaluation forum for
research and development on IR/NLP, first performed the patent
retrieval task. Two years of Japanese patents (approximately 7M
documents published in 1998-1999; 18GB) were used to evaluate
mono/cross-lingual patent retrieval systems. In addition,
approximately 17M Japanese/English parallel patent abstracts were used
to evaluate the effectiveness of extracting translation lexicons.

Areas of Interest

Patent corpora are associated with a number of interesting
characteristics, for which various CL/NLP techniques have promise for
improving the quality of patent processing.

* multilinguality: the same/similar contents (i.e., inventions) are
filed in different languages, for which machine translation,
cross/multi-lingual retrieval, and translation extraction alleviate
problems in accessing information in foreign languages.

* scalability: a huge amount of copora data is available and
periodically produced, for which text summarization and natural
language generation help produce understandable coherent condensed

* complexity: since patents consist of overwhelmingly long sentences,
parsing/chunking techniques help produce readable shorter fragments.

* classification: patents are manually categorized based on a specific
classification system, such as IPC (international patent
classification), which can be used for statistical classification

* novelty/temprality/dynamism: new terms and concepts associated with
inventions are periodically created, for which term extraction and
ontology construction techniques help update lexical resources for
patent processing.

* document structures: unlike newspaper articles, patents are
structured with a number of specific fields (e.g., titles, abstracts,
and claims). While conventional text segmentation techniques rely
mainly on linguistic contents (e.g., lexical chains), structure
analysis techniques (e.g., ones related to XML) are also crucial in
the context of CL/NLP.

* applications: the above techniques can directly contribute to a
number of applications, such as patent retrieval systems.

We invite both research papers and project papers associated with, but
not limited to, the rudiments of patent corpus processing listed
above. We also invite papers addressing applications and user studies.

Important Dates

Submission deadline: 10 April 2003
Acceptance notification: 12 May 2003
Final version deadline: 30 May 2003
Workshop date: 12 July 2003

Workshop Chairs

Makoto Iwayama, Tokyo Institute of Technology / Hitachi Ltd., Japan
Atsushi Fujii, University of Tsukuba, Japan

Contact Information

Atsushi Fujii, fujii at slis.tsukuba.ac.jp
University of Tsukuba, Japan

LINGUIST List: Vol-14-872

More information about the Linguist mailing list