Very well, in my view, reasoning is sub-graph matching and graph extending, and a logic rule is a graph projection. A fast sub-graph matching algorithm is a strong base for reasoning. And reasoning is the base of natural language understanding.<div>
<br></div><div>Actually, I don't think we need to convert all the informal texts into formal models. Human persons with different backgrounds may get different understanding from a same text. There is no gold standard of natural language understanding for human. However, each human person will read and think about a text until he believe he understand already or he give up. The decision is based on whether the reader build up a reasonable vision from the text. What is a reasonable vision? A reasonable vision means the concepts in the vision and their relations satisfy all the existing logic rule in the system. Their will be reasoning chains. The chains will compose a reasoning circle. (The circle may contains multiple branches) The chains and the circle may be through the existing knowledge network. That's why different person with different backgrounds get different understanding. Only when the reasoning circle from a text is not through the existing knowledge network, the text can be convert into a formal model, because the information in the text is completed.</div>
<div><br></div><div>Natural language texts are not completed. The understanding of a text highly depends on the existing knowledge out of the text. So if we hope an AI system can learn from raw texts, the learning order is very important. That's why I'm looking for the corpus of English for children. I don't expect the AI system can deal with complex texts practically before it can build a reasonable vision from a very simple text. </div>
<div><br></div><div>To be honest, I believe the NLP problem is becoming an education problem. </div><div> </div><div><br><div class="gmail_quote">On Mon, Mar 14, 2011 at 6:09 AM, John F. Sowa <span dir="ltr"><<a href="mailto:sowa@bestweb.net">sowa@bestweb.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">On 3/12/2011 4:15 PM, Nathan Hu wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
The bottleneck currently is how can we get the high-precision results of<br>
coreference resolution to build completed Conceptual Graphs from texts.<br>
</blockquote>
<br></div>
That is the bottleneck that has plagued every version of formal<br>
semantics from Montague to the present. Logicians publish papers<br>
with toy sentences like "John seeks a unicorn". But the fatal flaw<br>
is that they usually assume Frege's principle:<br>
<br>
The meaning of a sentence is completely determined<br>
by the meaning of its symbols and the syntax for<br>
combining those symbols.<br>
<br>
Natural languages violate Frege's constraint in multiple ways.<br>
In general, the meaning of a sentence depends critically on context-<br>
dependent factors, such as the time and place of utterance, the<br>
speaker (or writer), the listener (or reader), their background<br>
knowledge, their intentions, the speaker's guess about the listener's<br>
knowledge and intentions, the listener's guess about the speaker, etc.<br>
<br>
Linguists and logicians have published many excellent analyses<br>
of each of those issues. But NL texts that occur "in the wild"<br>
violate Frege's principle in many different and highly creative<br>
ways -- sometimes several different ways in a single sentence.<br>
<br>
One of the famous epigrams of programming by Alan Perlis:<br>
<br>
"One can't proceed from the informal to the formal by formal means."<br>
<br>
There are some well-written texts that are sufficiently precise that<br>
they can be translated to a formal logic. For example, Naproche<br>
(NAtural-language PROof CHEcker) maps a mathematical proof stated<br>
in English to logic and checks the proof ( <a href="http://naproche.net/" target="_blank">http://naproche.net/</a> ).<br>
<br>
What makes that English precise is that the author (a mathematician)<br>
(1) has a precise formal semantics in mind and (2) makes an effort<br>
to describe it clearly. Very few texts meet both of those criteria.<br>
<br>
Programming languages are just as formal as mathematics, but programmers<br>
are notoriously lazy about documenting what they do in any language.<br>
When they do, the results look like Slide 27 in the talk I mentioned:<div class="im"><br>
<br>
<a href="http://www.jfsowa.com/talks/pursue.pdf" target="_blank">http://www.jfsowa.com/talks/pursue.pdf</a><br>
<br></div>
That language cannot be translated to the formal language of the<br>
original program (COBOL, in this case). However, if you *start*<br>
with the COBOL and map it to a formal notation (in this case to<br>
conceptual graphs), you can generate precise, formal graphs.<br>
<br>
With the usual methods of NLP, you can map informal English to graphs<br>
that are just as informal as the English. Then with suitable graph-<br>
matching algorithms, you can find an approximate match of the informal<br>
graphs to the formal graphs (assuming, of course, that you have some<br>
independent source for the formal graphs -- and that's a very big<br>
assumption that is often difficult or impossible to satisfy).<br>
<br>
Note that this method does not violate the epigram by Perlis:<br>
the precision does *not* come from English, but from COBOL<br>
(or some other source for the formal graphs).<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">
As my understanding, you build an index for conceptual graphs.<br></div>
Does this method work for sub-graph matching?<br>
</blockquote>
<br>
Yes. There are many different, but related algorithms for doing<br>
such indexing and searching. I cited some based on methods for<br>
chemical graphs. Those applications *require* searching and<br>
finding subgraphs.<br>
<br>
For pharmaceutical applications, they want to find chemicals that<br>
have the same active subgraph (the critical part for some drug) but<br>
may have different molecular structures attached to that subgraph.<br>
That's very similar to the requirements for NLP.<br><font color="#888888">
<br>
John<br>
</font></blockquote></div><br></div>