<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:st1="urn:schemas-microsoft-com:office:smarttags" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><o:SmartTagType
namespaceuri="urn:schemas-microsoft-com:office:smarttags" name="place"/>
<!--[if !mso]>
<style>
st1\:*{behavior:url(#default#ieooui) }
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:blue;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:Arial;
color:blue;
font-weight:normal;
font-style:normal;
text-decoration:none none;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
{page:Section1;}
-->
</style>
</head>
<body lang=EN-US link=blue vlink=blue>
<div class=Section1>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>Dear Siddhartha,<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>Metadata in RDBMSs is usually stored in
tables (as you may already know) that can be queried or updated. I show
how to do so in a program that represents metadata in tables in my patent
7,209,923:<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><a
href="http://www.englishlogickernel.com/Patent-7-209-923-B1.pdf">http://www.englishlogickernel.com/Patent-7-209-923-B1.pdf</a><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>That view of the database tables, columns
and rows is ideal for reasoning tasks and still takes full advantage of the
RDBMS features, as shown in the text and figures of that patent. Figure 2
shows a generic view of metadata arranged in tables for representing symbols
and text strings, including tokens and phrases. A copy is below, if it
gets through the email:<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><img border=0 width=363 height=244
id="_x0000_i1027" src="cid:image001.jpg@01CCB66A.722E7E60"><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>Relationships can be modeled most
obviously by creating tables named for the relationship, with rows that contain
the constants and variables you want to place into the relationship. I
use text names for constants and variables, with variables distinctively
starting with underscores (“_”) much like prolog does. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>I keep one table locked in memory for the
symbol table, which binds a unique arrival ID (an integer that grows with each
new symbol definition) to a unique text string. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>The metadata tables relate to the symbol
table by storing just the indexed arrival ID for that string, whether the
string is a symbol or a phrase extracted from a text source. Unification
is very fast given that representation because the integer indexes are adequate
for calculating unifications. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>My particular NLP interest of the moment
is in examining patent specifications, which contain unstructured text fields
within a formulaic overall outline that can be dissected algorithmically.
Patent claims are phrases that bind the sentence “I claim X” so
that each claim phrase can be substituted for X. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>I also use an inverted text method for
separating out phrases (mostly sentences) from texts. Each patent
document is read in as text, inverted to enumerate phrases (approximately
sentences). Each indexed phrase from the inverted document is then
tokenized, with the tokens interned uniquely into the symbol table. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>Quantifiers are transformed into sequences
of symbol table arrival IDs (integers), and the sequences are stored as rows in
the relationship modeling table. Since quantifiers can be either
constants or variables, rules can be generalized from instance data in the
claim phrase or the specification phrases. That means all stored
relations, other than metadata tables, have rows containing cells populated by
integers. That is why unification and search are so fast with this
representation. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>There is an example program you can
download, though it doesn’t work with Windows 7 yet. You can run it
on a <st1:place w:st="on">Vista</st1:place> or an XP box though. It can
be downloaded from:<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><a
href="http://www.englishlogickernel.com/setup.exe">http://www.englishlogickernel.com/setup.exe</a><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>I promise it won’t screw up your
computer; I use the program on a daily basis and it helps enormously in my
business of patent analysis. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>It isn’t a general tool, but an
application of NLP analysis. I am planning a more general analysis tool,
but that won’t be ready for quite a while yet. Once I have solved
all operational problems for the patent analysis task, I will reorganize the
software components to provide the general capability. For now, this is
as much as I can handle with the available resources. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>Please feel free to ask questions if any
of the above isn’t clear. <o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'>-Rich<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=blue face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:blue'><o:p> </o:p></span></font></p>
<div>
<p class=MsoNormal><font size=3 color=black face="Times New Roman"><span
style='font-size:12.0pt;color:black'>Sincerely,<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 color=black face="Times New Roman"><span
style='font-size:12.0pt;color:black'>Rich Cooper<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 color=black face="Times New Roman"><span
style='font-size:12.0pt;color:black'>EnglishLogicKernel.com</span></font><font
color=blue><span style='color:blue'><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 color=black face="Times New Roman"><span
style='font-size:12.0pt;color:black'>Rich AT EnglishLogicKernel DOT com</span></font><font
color=blue><span style='color:blue'><o:p></o:p></span></font></p>
<p class=MsoNormal><font size=3 color=black face="Times New Roman"><span
style='font-size:12.0pt;color:black'>9 4 9 \ 5 2 5 - 5 7 1 2</span></font><o:p></o:p></p>
</div>
<div>
<div class=MsoNormal align=center style='text-align:center'><font size=3
face="Times New Roman"><span style='font-size:12.0pt'>
<hr size=3 width="100%" align=center tabindex=-1>
</span></font></div>
<p class=MsoNormal><b><font size=2 face=Tahoma><span style='font-size:10.0pt;
font-family:Tahoma;font-weight:bold'>From:</span></font></b><font size=2
face=Tahoma><span style='font-size:10.0pt;font-family:Tahoma'> Siddhartha
Jonnalagadda [mailto:sid.kgp@gmail.com] <br>
<b><span style='font-weight:bold'>Sent:</span></b> Friday, December 09, 2011
11:03 AM<br>
<b><span style='font-weight:bold'>To:</span></b> Rich Cooper<br>
<b><span style='font-weight:bold'>Cc:</span></b>
nlp2rdf@lists.informatik.uni-leipzig.de; CORPORA List; Jens Lehmann<br>
<b><span style='font-weight:bold'>Subject:</span></b> Re: [Corpora-List]
[NLP2RDF] Announcement: NLP Interchange Format(NIF)</span></font><o:p></o:p></p>
</div>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
<p class=MsoNormal style='margin-bottom:12.0pt'><font size=3 face=Verdana><span
style='font-size:12.0pt;font-family:Verdana'>Hey Rich,<br>
<br>
RDBMS is an industry standard that works well for some things such as storing
the extracted metadata, but might not be optimal for performing reasoning over
it. That might be one reason some people use other representations such as
RDF/SPARQL for higher-level tasks. In general, storing everything in the Common
Analysis Structure defined UIMA's type system works for me and where needed I
could write them into a Database. What is the optimal way to represent the
metadata for reasoning tasks? How could I transfer my UIMA CAS into that
"thing"?<br>
<br clear=all>
Sincerely,<br>
Siddhartha Jonnalagadda, </span></font>Ph.D.<font face=Verdana><span
style='font-family:Verdana'><br>
</span></font><a href="http://sjonnalagadda.wordpress.com" target="_blank"><font
face=Verdana><span style='font-family:Verdana'>sjonnalagadda.wordpress.com</span></font></a><font
face=Verdana><span style='font-family:Verdana'><br>
<br>
</span></font><br>
<br>
<o:p></o:p></p>
<div>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'>On Fri, Dec 9, 2011 at 11:56 AM, Rich Cooper <<a
href="mailto:rich@englishlogickernel.com">rich@englishlogickernel.com</a>>
wrote:<o:p></o:p></span></font></p>
<div link=blue vlink=blue>
<div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=2 color=blue face=Arial><span style='font-size:10.0pt;font-family:Arial;
color:blue'>Dear Siddhartha,</span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=2 color=blue face=Arial><span style='font-size:10.0pt;font-family:Arial;
color:blue'> </span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=2 color=blue face=Arial><span style='font-size:10.0pt;font-family:Arial;
color:blue'>Could you please provide more detail about what you need in the way
of “more computer-interpretable than RDBMS”? I use the RDBMS
columns with unstructured text, analyze the text in software, and populate new
columns to store the analyzed NLP information. By iteratively aggregating
RDBMS columns, I am able to process NLP quite well using the RDBMS capabilities
plus software functionality for interpretation. </span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=2 color=blue face=Arial><span style='font-size:10.0pt;font-family:Arial;
color:blue'> </span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=2 color=blue face=Arial><span style='font-size:10.0pt;font-family:Arial;
color:blue'>More information would be useful,</span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=2 color=blue face=Arial><span style='font-size:10.0pt;font-family:Arial;
color:blue'>-Rich</span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=2 color=blue face=Arial><span style='font-size:10.0pt;font-family:Arial;
color:blue'> </span></font><o:p></o:p></p>
<div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 color=black face="Times New Roman"><span style='font-size:12.0pt;
color:black'>Sincerely,</span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 color=black face="Times New Roman"><span style='font-size:12.0pt;
color:black'>Rich Cooper</span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 color=black face="Times New Roman"><span style='font-size:12.0pt;
color:black'>EnglishLogicKernel.com</span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 color=black face="Times New Roman"><span style='font-size:12.0pt;
color:black'>Rich AT EnglishLogicKernel DOT com</span></font><o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 color=black face="Times New Roman"><span style='font-size:12.0pt;
color:black'>9 4 9 \ 5 2 5 - 5 7 1 2</span></font><o:p></o:p></p>
</div>
<div>
<div class=MsoNormal align=center style='text-align:center'><font size=3
face="Times New Roman"><span style='font-size:12.0pt'>
<hr size=3 width="100%" align=center>
</span></font></div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b><font
size=2 face=Tahoma><span style='font-size:10.0pt;font-family:Tahoma;font-weight:
bold'>From:</span></font></b><font size=2 face=Tahoma><span style='font-size:
10.0pt;font-family:Tahoma'> <a href="mailto:corpora-bounces@uib.no"
target="_blank">corpora-bounces@uib.no</a> [mailto:<a
href="mailto:corpora-bounces@uib.no" target="_blank">corpora-bounces@uib.no</a>]
<b><span style='font-weight:bold'>On Behalf Of </span></b>Siddhartha
Jonnalagadda<br>
<b><span style='font-weight:bold'>Sent:</span></b> Friday, December 09, 2011
9:07 AM<br>
<b><span style='font-weight:bold'>To:</span></b> <a
href="mailto:nlp2rdf@lists.informatik.uni-leipzig.de" target="_blank">nlp2rdf@lists.informatik.uni-leipzig.de</a>;
CORPORA List<br>
<b><span style='font-weight:bold'>Cc:</span></b> Jens Lehmann<br>
<b><span style='font-weight:bold'>Subject:</span></b> Re: [Corpora-List]
[NLP2RDF] Announcement: NLP Interchange Format(NIF)</span></font><o:p></o:p></p>
</div>
<div>
<div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 face="Times New Roman"><span style='font-size:12.0pt'> <o:p></o:p></span></font></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;margin-bottom:12.0pt'><font
size=3 face=Verdana><span style='font-size:12.0pt;font-family:Verdana'>Somewhat
related issue:<br>
Since UIMA is seeing an increasing use within NLP community (both Information
Extraction and others such as Question/Answering), I wonder why another
standard as opposed to an interface between the UIMA type system and one of the
many existing standards. In other words, is there some work on representing the
information we extract in a way more computer-interpretable than RDBMS?<br>
<br clear=all>
Sincerely,<br>
Siddhartha Jonnalagadda, </span></font>Ph.D.<font face=Verdana><span
style='font-family:Verdana'><br>
</span></font><a href="http://sjonnalagadda.wordpress.com" target="_blank"><font
face=Verdana><span style='font-family:Verdana'>sjonnalagadda.wordpress.com</span></font></a><font
face=Verdana><span style='font-family:Verdana'><br>
<br>
<br>
</span></font><o:p></o:p></p>
<div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 face="Times New Roman"><span style='font-size:12.0pt'>On Fri, Dec 9,
2011 at 10:39 AM, John F. Sowa <<a href="mailto:sowa@bestweb.net"
target="_blank">sowa@bestweb.net</a>> wrote:<o:p></o:p></span></font></p>
<div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 face="Times New Roman"><span style='font-size:12.0pt'>Before making a
firm commitment to any notation as a standard for NLP,<br>
I suggest that you poll computational linguists and ask them what they<br>
would prefer for their work. Among the questions you could ask is to<br>
look at those five serializations and check which one(s) they prefer.<br>
<br>
Corpora List is a good place to start such a poll.<o:p></o:p></span></font></p>
</div>
</div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><font
size=3 face="Times New Roman"><span style='font-size:12.0pt'> <o:p></o:p></span></font></p>
</div>
</div>
</div>
</div>
</div>
<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:
12.0pt'><o:p> </o:p></span></font></p>
</div>
</body>
</html>