[Corpora-List] Is a complete grammar possible (beyond thecorpus itself)?

Thu Sep 6 16:25:30 UTC 2007

There are still Chomskyans on the prowl though:

http://www.amazon.com/Atoms-Language-Minds-Hidden-Grammar/dp/019860632X/ref=
pd_bbs_sr_1/103-3873164-6759834?ie=UTF8
<http://www.amazon.com/Atoms-Language-Minds-Hidden-Grammar/dp/019860632X/ref
=pd_bbs_sr_1/103-3873164-6759834?ie=UTF8&s=books&qid=1189095805&sr=8-1>
&s=books&qid=1189095805&sr=8-1

-Rich

  _____  

From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of
Rob Freeman
Sent: Wednesday, September 05, 2007 8:01 AM
To: David Brooks; CORPORA at UIB.NO
Subject: Re: [Corpora-List] Is a complete grammar possible (beyond thecorpus
itself)?

Hi David,

Thanks for looking at this so closely.

Yes, you are exactly right. What I am suggesting is slightly different to
Oliver's interpretation. Though Oliver's issue (dialectal differences?) can
be integrated easily. 

On 9/5/07, David Brooks <d.j.brooks at cs.bham.ac.uk> wrote:

First, I think that storing a corpus verbatim and attempting to recover
different information according to context is a great idea for
computational linguistics, and particularly in combining machine
learning approaches into language models. However, I'm not sure how well 
it stands up (or whether it is even intended) as an account of human
language learning. Is there evidence from psycholinguistics that
supports or contradicts the claim that humans store all their linguistic
experience?

I think there is evidence. The importance of collocation and detail of
phraseology could be interpreted as such.

People will say there is clear evidence we recall only the gist. I agree,
but think this is because we "recall" based on meaning, and meaning is
defined by _sets_ of examples ( c.f. exemplar theory.) So we remember all
the individual examples in a set of exemplars, but can only "recall" the set
as a whole.

For example. I can't "recall" everything I read verbatim, but, anecdotally,
I may hear a sentence from a book I read years ago, and "remember" it
instantly (including maybe what I was doing at the time I read it.) 

Since "context" often includes the state of the world or other beings, is
the totality of human experience stored? 

I don't think everything is stored, but what is stored I think is stored
verbatim. (We may lose bits of it, but what we lose is not systematic.)

Second, some of the most controversial (in terms of generating debate)
aspects of Chomsky's approach are those that suggest that language faculties
are innate and specialised to deal only with language. These still pertain
(as issues to address) in a "combine several models according to context"
approach:
- which models will you use in your combination? Are they innate? Do they
represent "intelligence" that is specific to dealing with language (as
opposed to more general forms of intelligent behaviours)? 
- how do you define context? I assume context is defined in relation to a
model, so again, is this innate?

Actually I'm not so much suggesting that we integrate existing models. I'm
suggesting we focus instead on ways of finding models, or grammars, in short
grammatical induction, especially distributional analysis.

As far as contexts go I think typically grammatical induction gets good
results even just by clustering words on immediate contexts ( e.g. one
word.)

I only have one issue. As far as I am concerned grammatical induction has be
held up only by the (almost?) universal assumption it should be possible to
generalize grammar globally. 

Change that one assumption and I think we will immediately start to produce
very useful results.

As a corollary I don't think the generalization mechanism is specific to
language at all. I am sure it is general to all perceptual (intelligent?)
behaviour. 

How do you use context to trigger events?

Crudely put, I filter the possible grammar of each word on its context.
Other than that it is done in much the same way as grammatical induction is
done now. 

In grammatical induction you cluster the contexts of a word to "learn" a
grammatical class for it. I do the same. It is just that now I "learn" a
different class for each word, depending on what word is adjoining. So if
"black" adjoins "coffee", I "cluster" a different class for "black" than I
would if "cloud" were adjoining. 

I'd also like to return to one of Rob's much earlier points: that there is
little previous work on ideas akin to his. I can see a few parallels between
Rob's suggestion and the work of Rodney Brooks (no relation) in the field of
behaviour-based robotics. Brooks claimed that robots with internal
representations of the world suffered because their models were perpetually
out-of-sync with the world. He suggested a "world as its own best model"
theory, where the robot operates on percepts obtained from the world, and
avoids internal representation. I see this as similar to Rob's suggestion of
keeping the corpus, which acts as our "world", and avoiding a single-grammar
abstraction that might not fully account for the corpus. (I would agree with
Diana Santos' claim that a corpus is only a sample -- and an impoverished
sample in terms of contextual information -- of the world at a given time.)

That robot work sounds good. I agree, this sounds like the kind of thing I
mean.

Language is a much better test bed for such ideas, though, because it is so
accessible. It is difficult to model the "world" of a robot.

The "world" of language is just the corpus.

But you are looking for precedent. 

There is of course Paul Hopper's "Emergent Grammar". I think this is
essentially right, but hesitate to mention it because somehow it is always
mis-interpreted. The idea of something which cannot be described in terms of
rules just seems to be too subtle, and perhaps Paul has not had the maths to
formalize it. For whatever reason, people always seem to identify his
"emergence" with "evolution" of grammar (which he specifically denies
below.) Read correctly I think the ideas are all there. Only an
implementation is missing: 

Here is his famous paper from 1985(?):

http://eserver.org/home/hopper/emergence.html 

>>>
I am concerned in this paper with ... the assumption of an abstract,
mentally represented rule system which is somehow implemented when we speak.

...

The notion of emergence is a pregnant one. It is not intended to be a
standard sense of origins or genealogy, not a historical question of 'how'
the grammar came to be the way it 'is', but instead it takes the adjective
emergent seriously as a continual movement towards structure, a postponement

or 'deferral' of structure, a view of structure as always provisional,
always negotiable, and in fact as epiphenomenal, that is at least as much an
effect as a cause.

...

Structure, then, in this view is not an overarching set of abstract
principles, but more a question of a spreading of systematicity from
individual words, phrases, and small sets. 
>>>

In engineering terms all I have been able to find is an approach called
"similarity modeling". This had some success improving speech recognition
scores using crude bigrams (generalized ad-hoc) some years ago: 

e.g . http://citeseer.ist.psu.edu/dagan99similaritybased.html 

There an earlier paper with the nicest quote. I think it is this one: 

Dagan, Ido, Shaul Marcus and Shaul Markovitch. Contextual word similarity
and estimation from sparse data, Computer, Speech and Language, 1995, Vol.
9, pp. 123-152. 

p.g. 4:

"In a general perspective, the similarity-based approach promotes an
"unstructured" point of view on the way linguistic information should be
represented. While traditional approaches, especially for semantic
classification, have the view that information should be captured by the
maximal possible generalizations, our method assumes that generalizations
should be minimized.  Information is thus kept at a maximal level of detail,
and missing information is deduced by the most specific analogies, which are
carried out whenever needed.  Though the latter view seems hopeless for
approaches relying on manual knowledge acquisition, it may turn very useful
for automatic corpus-based approaches, and better reflect the nature of
unrestricted language." 

I don't think they put this "similarity modeling" in a theoretical context
with Hopper at all. And as I say, they only applied this to the estimation
of bigrams. But as a crude example of ad-hoc estimation of grammatical
parameters it goes in the right direction. 

To go further you need to generalize the representation of non-terminal
elements so they too can be vectors of examples. I don't think that is
difficult. I've used a kind of "cross-product".

I had a parser on-line for a while which worked quite well doing this.

As I say, the principles are very much like what has been done already with
machine learning/grammatical induction. We can use a lot of that.

I'm sure the main thing we need to change is only the assumed goal of a
single complete grammar. 

-Rob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070906/8169a301/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora