[Corpora-List] Bootcamp: 'Quantitative Corpus Linguistics withR'--re Louw's endorsement

Sat Aug 16 15:06:35 UTC 2008

Wolfgang,

The fact that some approach has been inspired by cognitive theories
does not disqualify it from being applied to corpora.  And there's
no reason why you can't mix and match multiple methods of various
kinds -- logical, analogical, statistical, heuristic, or whatever.

 > A number of responses I have received via the list or in private
 > suggest that the future will see the integration of corpus
 > linguistics with cognitive approaches.  I disagree.

I have no idea what you mean by "integration" or why you assume that
a cognitive approach must be based on introspection:

 > The problem is that the mind does not allow introspection. No one
 > has ever presented evidence for a single mental concept.

I have been working with some colleagues who have been using
conceptual graphs to represent data from multiple sources, either
unstructured, untagged documents or structured data from any source,
such as relational DBs or tags of any kind on any sources.  As an
example of a query stated in several English sentences, which was
answered from a collection of 79 untagged English documents, see
slides 26 to 37 of the following talk:

    http://www.jfsowa.com/talks/pursuing.pdf
    Pursuing the Goal of Language Understanding

The approach uses multiple heterogeneous agents, which can use
different techniques to interpret a text.  If an ontology is
available, some agent will use it to interpret a sentence as it
is being parsed.  If multiple ontologies are available, multiple
agents, each one using a different ontology will attempt to
interpret a sentence or part of a sentence.  If no ontology is
available, some agents will use statistical methods.  It's even
possible for different agents to use different techniques with
different ontologies on the *same* sentence.  Some agents use
logic, but most don't.

In case of conflicts (which are the norm, not the exception),
higher level agents or a committee of higher level agents
will choose what they consider the best interpretation for
each phrase.  Individually, the agents don't have to be very
intelligent. (Imagine them as judges at the Olympic Games.)

If a sentence happens to be about a single unified topic, it is
likely that all the phrases will be interpreted by agents working
with the same ontology.  But if it mixes or relates different
topics, different parts might be interpreted by different agents
working with radically different methods.

Then the CGs are indexed (with pointers back to the original
documents), and the analogy engine is used to find the best
match (or matches) to a given query (which may be one sentence,
multiple sentences, or an arbitrary document).  The time to
index the graphs grows as (N log N), and the time to find
a graph that is similar to a given graph grows as (log N).

John Sowa

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora