[Corpora-List] Language used in Patents

Kevin B. Cohen kevin.cohen at gmail.com
Thu Feb 26 16:11:55 UTC 2009


Eva,

Here's a paper on information retrieval of patents based on named entity
recognition of chemicals & converting from a textual to a structural
representation:

http://psb.stanford.edu/psb-online/proceedings/psb07/rhodes.pdf

Text analytics is becoming an increasingly important tool used in biomedical

research. While advances continue to be made in the core algorithms for
entity
identification and relation extraction, a need for practical applications of
these
technologies arises. We developed a system that allows users to explore the
US
Patent corpus using molecular information. The core of our system contains
three main technologies: A high performing chemical annotator which identi-
fies chemical terms and converts them to structures, a similarity search
engine
based on the emerging IUPAC International Chemical Identifier (InChI) stan-
dard, and a set of on demand data mining tools. By leveraging this
technology
we were able to rapidly identify and index 3, 623, 248 unique chemical
struc-
tures from 4, 375, 036 US Patents and Patent Applications. Using this system

a user may go to a web page, draw a molecule, search for related
Intellectual
Property (IP) and analyze the results. Our results prove that this is a far
more
effective way for identifying IP than traditional keyword based approaches.
Kev

On Thu, Feb 26, 2009 at 4:02 AM, Eva D'hondt <e.dhondt at let.ru.nl> wrote:

> Hello,
>
> We have just started a project here at the Radboud University of Nijmegen
> that deals with Passage Retrieval and Text Mining in patent texts. I was
> wondering if anyone could point me to some literature/research/interesting
> facts on the linguistic and statistical characteristics of the language used
> in patent texts (e.g. frequency and hierarchical organisation of
> PP-attachments, use of gerund clauses vs. the relative clause with an
> inflected verb, average sentence length in the different sections, ... ).
>
> I will of course post a summary of your replies on this list.
>
> Thank you ever so much!
>
>  Eva
>
>
> Eva D'hondt, PhD student
> Centre for Language and Speech Technology
> University of Nijmegen
> Email: e.dhondt at let.ru.nl
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>


-- 
K. B. Cohen
Biomedical Text Mining Group Lead, Center for Computational Pharmacology
and
Lead Artificial Intelligence Engineer, The MITRE Corporation, Human Language
Technology Division
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.uchsc.edu/Hunter_lab/Cohen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090226/1da78f3f/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list