<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<span style="">[Apologies for multiple postings]</span><br style="">
<b style=""><br>
***EXTENDED DEADLINE: FRI 2 MARCH 2012***<br>
<br>
FINAL CALL FOR PAPERS</b><u style=""><i><br>
</i></u><i style="">Workshop
on Language Technology for Patent Data: Language Resources and
Evaluation</i><br style="">
<br style="">
<span style="">To be held in conjunction with the 8th International </span><span
style="">Language</span><span style=""> </span><span style="">Resources</span><span
style=""> and Evaluation Conference (LREC 2012)</span><br style="">
<br style="">
<span style="">27 May 2012 (afternoon)</span><br style="">
<br style="">
<span style="">Lütfi Kirdar Istanbul Exhibition and Congress Centre,
Istanbul, Turkey</span><br style="">
<br style="">
<a href="http://workshops.elda.org/ltpd2012/" target="_blank"
style="">http://workshops.elda.org/ltpd2012/</a><br style="">
<br style="">
<b style="">Workshop Description</b><br style="">
<span style="">In the last few years, the use of</span><b style=""> patents </b><span
style="">in automatic processing has shown a growing interest in
the</span><br style="">
<span style="">NLP community. This has been particularly the case in
the context of </span><b style="">Machine Translation (MT)</b><span
style=""> or</span><br style="">
<b style="">Cross-Lingual Information Retrieval (CLIR)</b><span
style="">. Nowadays this has become a major topic and besides</span><br
style="">
<span style="">the development of the technology itself, some key
points remain regarding the </span><span style="">resources</span><span
style=""> available</span><br style="">
<span style="">and the way of evaluating the quality of the
technology.</span><br style="">
<br style="">
<span style="">A large number of </span><span style="">language</span><span
style=""> </span><span style="">resources</span><span style=""> is
already available </span><span style="">for</span><span style=""> the
community, but the development</span><br style="">
<span style="">of systems, in particular the statistical ones,
always requires more and more data. As there is a</span><br
style="">
<span style="">growing interest </span><span style="">for</span><span
style=""> patents and their processing, a workshop on the topic
which gathers all those</span><br style="">
<span style="">involved in the different aspects concerned is a good
opportunity to move forward.</span><br style="">
<span style="">The domain of patents itself is increasing and the
amount of potential material does not cease to</span><br style="">
<span style="">increase. It is this potential material that gives
hope to the community </span><span style="">for</span><span
style=""> improving the systems.</span><br style="">
<span style="">For</span><span style=""> instance, in China, the
number of patents have been multiplied by 3 in 5 years and they
exceed</span><br style="">
<span style="">1 million published documents per year by now. EPO
(the European </span><span style="">Patent</span><span style=""> Office)
uses more than</span><br style="">
<span style="">150 translation pairs per day. Every </span><span
style="">patent</span><span style=""> office receives more and
more patents every day, needs a</span><br style="">
<span style="">daily use of automatic tools to translate the
documents, looks </span><span style="">for</span><span style=""> existing
patents and their</span><br style="">
<span style="">translation, manages complex content, etc. As we can
see, this is a domain in considerable demand</span><br style="">
<span style="">and since the content of the patents is technical and
needs high skills in a specific domain, providing</span><br
style="">
<span style="">documents that are sufficiently understandable to the
end users is very complex. This is a real</span><br style="">
<span style="">challenge </span><span style="">for</span><span
style=""> all NLP developers.</span><br style="">
<br style="">
<span style="">Above all, this challenge is about corpora and their
management. The main topic concerns their</span><br style="">
<span style="">acquisition and how to collect useful data. </span><span
style="">For</span><span style=""> most of the researchers, this
consists in harvesting</span><br style="">
<span style="">web pages, cleaning them, getting the useful content
according to a specific task, aligning the</span><br style="">
<span style="">sentences, etc. The acquisition task may also be done
using </span><b style="">OCR tools on PDF</b><span style="">.
Monolingual</span><br style="">
<span style="">corpora are easier to retrieve (e.g. from databases)
compared to parallel corpora. However, parallel</span><br style="">
<span style="">translations exist and aligned corpora as well, or
corpora that could be easily aligned. Following the</span><br
style="">
<span style="">question of the acquisition of such documents, there
is that of database management. One could say</span><br style="">
<span style="">that all these questions are not only related to </span><span
style="">patent</span><span style=""> data, however this workshop
would like focus</span><br style="">
<span style="">on this particular domain and make some effort to
improve things.</span><br style="">
<br style="">
<span style="">Currently, the corpora are mainly used </span><span
style="">for</span><span style=""> MT. </span><span style="">For</span><span
style=""> a technical end-user in a </span><span style="">patent</span><span
style=""> office, the end</span><br style="">
<span style="">goal is to manage to understand the content of a
document. This may not require a very high quality</span><br
style="">
<span style="">translation since this person only needs to grasp the
relevance of the document. However, in MT,</span><br style="">
<span style="">we still need to measure quantitatively the
performance of the systems. This is basically made using</span><br
style="">
<span style="">automatic and/or human measures, while most of the
system developers are using typical automatic</span><br style="">
<span style="">metrics such as BLEU to get their results. Even if
the drawbacks of such metrics are well-known, it</span><br
style="">
<span style="">could be still relevant, </span><span style="">for</span><span
style=""> instance, to compare different versions of a system.
However, even when</span><br style="">
<span style="">using BLEU, the content of </span><span style="">patent</span><span
style=""> documents is very particular, which implies that
different kinds</span><br style="">
<span style="">of linguistic specificity need to be tackled: these
include the already expected terminological level,</span><br
style="">
<span style="">but also a syntactic level, a semantic one, and even
the structure of the documents may be different</span><br style="">
<span style="">from that of other documents (</span><span style="">for</span><span
style=""> instance, patents typically comprise of a title, an
abstract, a</span><br style="">
<span style="">technical description of the invention, and a list of
novel claims). Human measures may be also</span><br style="">
<span style="">difficult to apply as </span><span style="">patent</span><span
style=""> documents are written in a way which makes them
difficult to read </span><span style="">for</span><br style="">
<span style="">the layman. Furthermore, both automatic and human
evaluations should have the chance to realise a</span><br style="">
<span style="">deep analysis of the results, which is not trivial
working with patents. However, given the often</span><br style="">
<span style="">formulaic nature of the text found in patents – which
is enforced on the author due to legal</span><br style="">
<span style="">constraints – there may be opportunities to exploit
this </span><span style="">for</span><span style=""> evaluation. </span><span
style="">For</span><span style=""> instance, claims are</span><br
style="">
<span style="">constructed as a single sentence with an introductory
phrase and a body linked by frequently</span><br style="">
<span style="">occurring terms such as “in a certain embodiment”,
“consisting essentially of”, and clauses and lists</span><br
style="">
<span style="">introduced using colons, e.g. “comprising: …”</span><br
style="">
<br style="">
<span style="">The use of patents in CLIR suffers from the same kind
of issues, either </span><span style="">for</span><span style=""> the
evaluation of systems</span><br style="">
<span style="">or </span><span style="">for</span><span style=""> the
collection of corpora. Sentence alignment may also have specific
issues related to the</span><br style="">
<span style="">content of the documents, and many other types of
tools may have their own thoughts using patents.</span><br
style="">
<span style="">Through all those technologies, one can see their
usage implies several challenges, such as the</span><br style="">
<span style="">integration of tools into </span><span style="">patent</span><span
style=""> information applications. The different tools should
help end-users to</span><br style="">
<span style="">search, examine or classify </span><span style="">patent</span><span
style=""> documents, most of the time from translations and not
available</span><br style="">
<span style="">in English. Web services should also be an extension
of the tools and web services should be</span><br style="">
<span style="">connected through workflows, helping end-users in
their daily work.</span><br style="">
<span style="">Among all the topics previously mentioned, we would
like to contribute to the improvement of the</span><br style="">
<span style="">challenging </span><span style="">patent</span><span
style=""> field, by sharing the knowledge from the whole
community.</span><br style="">
<br style="">
<span style="">The different topics addressed during the workshop
will be (but are not limited to):</span><br style="">
<span style="">- Corpora aspects: collecting data, cleaning,
alignment, parallel corpora, etc.;</span><br style="">
<span style="">- Evaluation of technologies: definition of metrics, </span><span
style="">patent</span><span style=""> specificity;</span><br
style="">
<span style="">- Integration of </span><span style="">patent</span><span
style=""> applications: web services, end-user applications;</span><br
style="">
<span style="">- IPR issues and licensing.</span><br style="">
<br style="">
<b style="">Organising committee</b><br style="">
<span style="">Heidi Depraetere (Crosslang, Belgium)</span><br
style="">
<span style="">Olivier Hamon (ELDA – Evaluations and </span><span
style="">Language</span><span style=""> </span><span style="">resources</span><span
style=""> Distribution Agency, France)</span><br style="">
<span style="">John </span><span class="il" style="">Tinsley</span><span
style=""> (PLUTO – </span><span style="">Patent</span><span
style=""> </span><span style="">Language</span><span style=""> Translations
Online, Ireland)</span><br style="">
<br style="">
<b style="">Programme committee</b><br style="">
<span style="">Victoria Arranz (ELDA – Evaluations and </span><span
style="">Language</span><span style=""> </span><span style="">resources</span><span
style=""> Distribution Agency, France)</span><br style="">
<span style="">Alexandru Ceausu (PLUTO - </span><span style="">Patent</span><span
style=""> </span><span style="">Language</span><span style=""> Translations
Online, Ireland)</span><br style="">
<span style="">Khalid Choukri (ELDA, France)</span><br style="">
<span style="">Terumasa Ehara (Yamanashi Eiwa College, Japan)</span><br
style="">
<span style="">Cristina España-Bonet (UPC, Spain)</span><br style="">
<span style="">Mihai Lupu (IRF and ESTeam, Austria)</span><br
style="">
<span style="">Bertrand Le Chapelain (EPO, Netherlands)</span><br
style="">
<span style="">Bente Maegaard (University of Copenhagen, Denmark)</span><br
style="">
<span style="">Walid Magdy (Dublin City Univerisry, Ireland)</span><br
style="">
<span style="">Bruno Pouliquen (World Intellectual Property
Organization, Switzerland)</span><br style="">
<span style="">Lucia Specia (University of Sheffield, United
Kingdom)</span><br style="">
<span style="">Gregor Thurmair (Linguatec, Germany)</span><br
style="">
<span style="">Dan Wang (China </span><span style="">Patent</span><span
style=""> Information Center, China)</span><br style="">
<span style="">Shoichi Yokoyama (Yamagata University, Japan)</span><br
style="">
<br style="">
<span style="">More TBC...</span><br style="">
<br style="">
<b style="">Important dates</b><br style="">
<span style="">Deadline </span><span style="">for</span><span
style=""> submission: Friday 2 March 2012</span><br style="">
<span style="">Notification of acceptance: Friday 23 March 2012</span><br
style="">
<span style="">Final version due: Friday 30 March 2012</span><br
style="">
<span style="">Workshop : 27 May 2012 (afternoon)</span><br style="">
<br style="">
<b style="">Submission Format</b><br style="">
<span style="">Full papers up to 8 pages should be formatted
according to LREC 2012 guidelines and be submitted</span><br
style="">
<span style="">through the online submission form (</span><a
href="https://www.softconf.com/lrec2012/PATENT2012/"
target="_blank" style="">https://www.softconf.com/lrec2012/PATENT2012/</a><span
style="">) on</span><br style="">
<span style="">START. </span><span style="">For</span><span style=""> further
queries, please contact Olivier Hamon at hamon_at_elda_dot_org.</span><br
style="">
<span style="">When submitting a paper from the START page, authors
will be asked to provide essential</span><br style="">
<span style="">information about </span><span style="">resources</span><span
style=""> (in a broad sense, i.e. also technologies, standards,
evaluation kits, etc.)</span><br style="">
<span style="">that have been used </span><span style="">for</span><span
style=""> the work described in the paper or are a new result of
your research. </span><span style="">For</span><br style="">
<span style="">further information on this new initiative, please
refer to </span><a
href="http://www.lrec-conf.org/lrec2012/?LREMap" target="_blank"
style="">http://www.lrec-conf.org/lrec2012/?LREMap</a><span
style="">-</span><br style="">
<span style="">2012.</span>
<pre class="moz-signature" cols="72">--
---------------------------------------------------------------------------------------------------
Dr. Olivier HAMON <a class="moz-txt-link-abbreviated" href="mailto:hamon@elda.org">hamon@elda.org</a>
Project Manager - ELDA
55-57, rue Brillat Savarin Tel : +33 1 43 13 33 43
75013 Paris - France Fax : +33 1 43 13 33 30
<a class="moz-txt-link-freetext" href="http://www.elda.org">http://www.elda.org</a> <a class="moz-txt-link-freetext" href="http://www.lrec-conf.org">http://www.lrec-conf.org</a>
<a class="moz-txt-link-freetext" href="http://catalog.elra.info">http://catalog.elra.info</a> <a class="moz-txt-link-freetext" href="http://www.hlt-evaluation.org">http://www.hlt-evaluation.org</a>
---------------------------------------------------------------------------------------------------
</pre>
</body>
</html>