[Corpora-List] What support should a corpus provide to ontologists?

Sun Aug 10 00:44:29 UTC 2014

In developing ontologies to match corpora samples,
as in learning algorithms, what kind of analysis
of each document would be useful to compare one
patent claim against that patent's description,
and against an arbitrary potential prior art
candidate?

Entity recognition, with and without names or
descriptions or anaphora; 

Objects and activities mentioned in the claims, as
compared to those mentioned in each patent;
Mereological relationships among the identified
objects and activities; 

Common verb signature database with identified
variables and constants,

Modus ponens interpreter of signature phrases wrt
the identified objects and activities,

               Logic language of FOL level, Horne
clause, lexical scopes, question answering,

               Heuristic search through And/or
graphs with FOL parameterization, simple algebra

What have I missed?

The idea, or long term goal, is to build an
ontology of patent claims as encountered in
published patents.  If that turns out to be
helpful, other document analysis tasks might
benefit from the ontology so developed.  

-Rich

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

From: corpora-bounces at uib.no
[mailto:corpora-bounces at uib.no] On Behalf Of Rich
Cooper
Sent: Friday, August 08, 2014 11:12 AM
To: 'John F Sowa'; corpora at uib.no
Cc: '[ontolog-forum] '
Subject: [Corpora-List] What support should a
corpus provide?

Dear Corpus Analysts and Ontologists,

I have just made available a corpus of documents
from the US Patent and Trademark Office which are
available for corpus analysts.  The tools
available now are sufficient for supporting
attorneys, inventors, scientists, and other
similar application legal and technology roles.  

What additional support should I provide in the
software for supporting corpus analysis of
selected patent document subsets?  I have a web
site with extensive help and tutorial materials -
I suggest starting at:

www.EnglishLogicKernel.com/Help/help.htm

to see an index of capability descriptions.  I can
make available the "frequent words" and the "rare
words" lists as text files, along with the patent
documents in whole or in sections for data,
abstract, description and claims, which are
already extracted from the selected document set.
The claim tree is parsed, and the claims are
separated into claim elements, all of which can be
provided.  

Is there anything else that corpus analysts would
like to see in the software?

Suggestions highly appreciated,

-Rich

Sincerely,

Rich Cooper

EnglishLogicKernel.com

Rich AT EnglishLogicKernel DOT com

9 4 9 \ 5 2 5 - 5 7 1 2

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140809/3434be53/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora