<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <span style="">[Apologies for multiple postings]</span><br style="">

    <b style=""><br>

      ***EXTENDED DEADLINE: FRI 2 MARCH 2012***<br>

      <br>

      FINAL CALL FOR PAPERS</b><u style=""><i><br>

      </i></u><i style="">Workshop

      on Language Technology for Patent Data: Language Resources and

      Evaluation</i><br style="">

    <br style="">

    <span style="">To be held in conjunction with the 8th International </span><span

      style="">Language</span><span style=""> </span><span style="">Resources</span><span

      style=""> and Evaluation Conference (LREC 2012)</span><br style="">

    <br style="">

    <span style="">27 May 2012 (afternoon)</span><br style="">

    <br style="">

    <span style="">Lütfi Kirdar Istanbul Exhibition and Congress Centre,

      Istanbul, Turkey</span><br style="">

    <br style="">

    <a href="http://workshops.elda.org/ltpd2012/" target="_blank"

      style="">http://workshops.elda.org/ltpd2012/</a><br style="">

    <br style="">

    <b style="">Workshop Description</b><br style="">

    <span style="">In the last few years, the use of</span><b style=""> patents </b><span

      style="">in automatic processing has shown a growing interest in

      the</span><br style="">

    <span style="">NLP community. This has been particularly the case in

      the context of </span><b style="">Machine Translation (MT)</b><span

      style=""> or</span><br style="">

    <b style="">Cross-Lingual Information Retrieval (CLIR)</b><span

      style="">. Nowadays this has become a major topic and besides</span><br

      style="">

    <span style="">the development of the technology itself, some key

      points remain regarding the </span><span style="">resources</span><span

      style=""> available</span><br style="">

    <span style="">and the way of evaluating the quality of the

      technology.</span><br style="">

    <br style="">

    <span style="">A large number of </span><span style="">language</span><span

      style=""> </span><span style="">resources</span><span style=""> is

      already available </span><span style="">for</span><span style=""> the

      community, but the development</span><br style="">

    <span style="">of systems, in particular the statistical ones,

      always requires more and more data. As there is a</span><br

      style="">

    <span style="">growing interest </span><span style="">for</span><span

      style=""> patents and their processing, a workshop on the topic

      which gathers all those</span><br style="">

    <span style="">involved in the different aspects concerned is a good

      opportunity to move forward.</span><br style="">

    <span style="">The domain of patents itself is increasing and the

      amount of potential material does not cease to</span><br style="">

    <span style="">increase. It is this potential material that gives

      hope to the community </span><span style="">for</span><span

      style=""> improving the systems.</span><br style="">

    <span style="">For</span><span style=""> instance, in China, the

      number of patents have been multiplied by 3 in 5 years and they

      exceed</span><br style="">

    <span style="">1 million published documents per year by now. EPO

      (the European </span><span style="">Patent</span><span style=""> Office)

      uses more than</span><br style="">

    <span style="">150 translation pairs per day. Every </span><span

      style="">patent</span><span style=""> office receives more and

      more patents every day, needs a</span><br style="">

    <span style="">daily use of automatic tools to translate the

      documents, looks </span><span style="">for</span><span style=""> existing

      patents and their</span><br style="">

    <span style="">translation, manages complex content, etc. As we can

      see, this is a domain in considerable demand</span><br style="">

    <span style="">and since the content of the patents is technical and

      needs high skills in a specific domain, providing</span><br

      style="">

    <span style="">documents that are sufficiently understandable to the

      end users is very complex. This is a real</span><br style="">

    <span style="">challenge </span><span style="">for</span><span

      style=""> all NLP developers.</span><br style="">

    <br style="">

    <span style="">Above all, this challenge is about corpora and their

      management. The main topic concerns their</span><br style="">

    <span style="">acquisition and how to collect useful data. </span><span

      style="">For</span><span style=""> most of the researchers, this

      consists in harvesting</span><br style="">

    <span style="">web pages, cleaning them, getting the useful content

      according to a specific task, aligning the</span><br style="">

    <span style="">sentences, etc. The acquisition task may also be done

      using </span><b style="">OCR tools on PDF</b><span style="">.

      Monolingual</span><br style="">

    <span style="">corpora are easier to retrieve (e.g. from databases)

      compared to parallel corpora. However, parallel</span><br style="">

    <span style="">translations exist and aligned corpora as well, or

      corpora that could be easily aligned. Following the</span><br

      style="">

    <span style="">question of the acquisition of such documents, there

      is that of database management. One could say</span><br style="">

    <span style="">that all these questions are not only related to </span><span

      style="">patent</span><span style=""> data, however this workshop

      would like focus</span><br style="">

    <span style="">on this particular domain and make some effort to

      improve things.</span><br style="">

    <br style="">

    <span style="">Currently, the corpora are mainly used </span><span

      style="">for</span><span style=""> MT. </span><span style="">For</span><span

      style=""> a technical end-user in a </span><span style="">patent</span><span

      style=""> office, the end</span><br style="">

    <span style="">goal is to manage to understand the content of a

      document. This may not require a very high quality</span><br

      style="">

    <span style="">translation since this person only needs to grasp the

      relevance of the document. However, in MT,</span><br style="">

    <span style="">we still need to measure quantitatively the

      performance of the systems. This is basically made using</span><br

      style="">

    <span style="">automatic and/or human measures, while most of the

      system developers are using typical automatic</span><br style="">

    <span style="">metrics such as BLEU to get their results. Even if

      the drawbacks of such metrics are well-known, it</span><br

      style="">

    <span style="">could be still relevant, </span><span style="">for</span><span

      style=""> instance, to compare different versions of a system.

      However, even when</span><br style="">

    <span style="">using BLEU, the content of </span><span style="">patent</span><span

      style=""> documents is very particular, which implies that

      different kinds</span><br style="">

    <span style="">of linguistic specificity need to be tackled: these

      include the already expected terminological level,</span><br

      style="">

    <span style="">but also a syntactic level, a semantic one, and even

      the structure of the documents may be different</span><br style="">

    <span style="">from that of other documents (</span><span style="">for</span><span

      style=""> instance, patents typically comprise of a title, an

      abstract, a</span><br style="">

    <span style="">technical description of the invention, and a list of

      novel claims). Human measures may be also</span><br style="">

    <span style="">difficult to apply as </span><span style="">patent</span><span

      style=""> documents are written in a way which makes them

      difficult to read </span><span style="">for</span><br style="">

    <span style="">the layman. Furthermore, both automatic and human

      evaluations should have the chance to realise a</span><br style="">

    <span style="">deep analysis of the results, which is not trivial

      working with patents. However, given the often</span><br style="">

    <span style="">formulaic nature of the text found in patents – which

      is enforced on the author due to legal</span><br style="">

    <span style="">constraints – there may be opportunities to exploit

      this </span><span style="">for</span><span style=""> evaluation. </span><span

      style="">For</span><span style=""> instance, claims are</span><br

      style="">

    <span style="">constructed as a single sentence with an introductory

      phrase and a body linked by frequently</span><br style="">

    <span style="">occurring terms such as “in a certain embodiment”,

      “consisting essentially of”, and clauses and lists</span><br

      style="">

    <span style="">introduced using colons, e.g. “comprising: …”</span><br

      style="">

    <br style="">

    <span style="">The use of patents in CLIR suffers from the same kind

      of issues, either </span><span style="">for</span><span style=""> the

      evaluation of systems</span><br style="">

    <span style="">or </span><span style="">for</span><span style=""> the

      collection of corpora. Sentence alignment may also have specific

      issues related to the</span><br style="">

    <span style="">content of the documents, and many other types of

      tools may have their own thoughts using patents.</span><br

      style="">

    <span style="">Through all those technologies, one can see their

      usage implies several challenges, such as the</span><br style="">

    <span style="">integration of tools into </span><span style="">patent</span><span

      style=""> information applications. The different tools should

      help end-users to</span><br style="">

    <span style="">search, examine or classify </span><span style="">patent</span><span

      style=""> documents, most of the time from translations and not

      available</span><br style="">

    <span style="">in English. Web services should also be an extension

      of the tools and web services should be</span><br style="">

    <span style="">connected through workflows, helping end-users in

      their daily work.</span><br style="">

    <span style="">Among all the topics previously mentioned, we would

      like to contribute to the improvement of the</span><br style="">

    <span style="">challenging </span><span style="">patent</span><span

      style=""> field, by sharing the knowledge from the whole

      community.</span><br style="">

    <br style="">

    <span style="">The different topics addressed during the workshop

      will be (but are not limited to):</span><br style="">

    <span style="">- Corpora aspects: collecting data, cleaning,

      alignment, parallel corpora, etc.;</span><br style="">

    <span style="">- Evaluation of technologies: definition of metrics, </span><span

      style="">patent</span><span style=""> specificity;</span><br

      style="">

    <span style="">- Integration of </span><span style="">patent</span><span

      style=""> applications: web services, end-user applications;</span><br

      style="">

    <span style="">- IPR issues and licensing.</span><br style="">

    <br style="">

    <b style="">Organising committee</b><br style="">

    <span style="">Heidi Depraetere (Crosslang, Belgium)</span><br

      style="">

    <span style="">Olivier Hamon (ELDA – Evaluations and </span><span

      style="">Language</span><span style=""> </span><span style="">resources</span><span

      style=""> Distribution Agency, France)</span><br style="">

    <span style="">John </span><span class="il" style="">Tinsley</span><span

      style=""> (PLUTO – </span><span style="">Patent</span><span

      style=""> </span><span style="">Language</span><span style=""> Translations

      Online, Ireland)</span><br style="">

    <br style="">

    <b style="">Programme committee</b><br style="">

    <span style="">Victoria Arranz (ELDA – Evaluations and </span><span

      style="">Language</span><span style=""> </span><span style="">resources</span><span

      style=""> Distribution Agency, France)</span><br style="">

    <span style="">Alexandru Ceausu (PLUTO - </span><span style="">Patent</span><span

      style=""> </span><span style="">Language</span><span style=""> Translations

      Online, Ireland)</span><br style="">

    <span style="">Khalid Choukri (ELDA, France)</span><br style="">

    <span style="">Terumasa Ehara (Yamanashi Eiwa College, Japan)</span><br

      style="">

    <span style="">Cristina España-Bonet (UPC, Spain)</span><br style="">

    <span style="">Mihai Lupu (IRF and ESTeam, Austria)</span><br

      style="">

    <span style="">Bertrand Le Chapelain (EPO, Netherlands)</span><br

      style="">

    <span style="">Bente Maegaard (University of Copenhagen, Denmark)</span><br

      style="">

    <span style="">Walid Magdy (Dublin City Univerisry, Ireland)</span><br

      style="">

    <span style="">Bruno Pouliquen (World Intellectual Property

      Organization, Switzerland)</span><br style="">

    <span style="">Lucia Specia (University of Sheffield, United

      Kingdom)</span><br style="">

    <span style="">Gregor Thurmair (Linguatec, Germany)</span><br

      style="">

    <span style="">Dan Wang (China </span><span style="">Patent</span><span

      style=""> Information Center, China)</span><br style="">

    <span style="">Shoichi Yokoyama (Yamagata University, Japan)</span><br

      style="">

    <br style="">

    <span style="">More TBC...</span><br style="">

    <br style="">

    <b style="">Important dates</b><br style="">

    <span style="">Deadline </span><span style="">for</span><span

      style=""> submission: Friday 2 March 2012</span><br style="">

    <span style="">Notification of acceptance: Friday 23 March 2012</span><br

      style="">

    <span style="">Final version due: Friday 30 March 2012</span><br

      style="">

    <span style="">Workshop : 27 May 2012 (afternoon)</span><br style="">

    <br style="">

    <b style="">Submission Format</b><br style="">

    <span style="">Full papers up to 8 pages should be formatted

      according to LREC 2012 guidelines and be submitted</span><br

      style="">

    <span style="">through the online submission form (</span><a

      href="https://www.softconf.com/lrec2012/PATENT2012/"

      target="_blank" style="">https://www.softconf.com/lrec2012/PATENT2012/</a><span

      style="">) on</span><br style="">

    <span style="">START. </span><span style="">For</span><span style=""> further

      queries, please contact Olivier Hamon at hamon_at_elda_dot_org.</span><br

      style="">

    <span style="">When submitting a paper from the START page, authors

      will be asked to provide essential</span><br style="">

    <span style="">information about </span><span style="">resources</span><span

      style=""> (in a broad sense, i.e. also technologies, standards,

      evaluation kits, etc.)</span><br style="">

    <span style="">that have been used </span><span style="">for</span><span

      style=""> the work described in the paper or are a new result of

      your research. </span><span style="">For</span><br style="">

    <span style="">further information on this new initiative, please

      refer to </span><a

      href="http://www.lrec-conf.org/lrec2012/?LREMap" target="_blank"

      style="">http://www.lrec-conf.org/lrec2012/?LREMap</a><span

      style="">-</span><br style="">

    <span style="">2012.</span>

    <pre class="moz-signature" cols="72">-- 

---------------------------------------------------------------------------------------------------

Dr. Olivier HAMON                          <a class="moz-txt-link-abbreviated" href="mailto:hamon@elda.org">hamon@elda.org</a>

Project Manager - ELDA

55-57, rue Brillat Savarin             Tel : +33 1 43 13 33 43

75013 Paris - France                   Fax : +33 1 43 13 33 30

<a class="moz-txt-link-freetext" href="http://www.elda.org">http://www.elda.org</a>                    <a class="moz-txt-link-freetext" href="http://www.lrec-conf.org">http://www.lrec-conf.org</a>

<a class="moz-txt-link-freetext" href="http://catalog.elra.info">http://catalog.elra.info</a>               <a class="moz-txt-link-freetext" href="http://www.hlt-evaluation.org">http://www.hlt-evaluation.org</a>

---------------------------------------------------------------------------------------------------

</pre>

  </body>

</html>