Rules vs. Lists

Thu Jul 10 02:34:38 UTC 2008

Aya,

Thanks for asking.

The basic complexity ideas need not be limited to any formalism, but I
have a formalism. It is closest conceptually to grammatical induction
by distributional analysis. The main difference is that while
classical distributional analysis seeks to abstract classes to fit an
entire corpus, I only attempt to fit one sentence at a time.

It turns out the different orders of associating words to fit a new
sentence give very different results. A parse structure falls
naturally out of the process of selecting the best order. I used this
principle to implement a kind of parser.

There is a Web-based demo. If you have server space I could set it up
for you. Failing that you can see some examples of the kind of output
you get at http://www.chaoticlanguage.com/flat_site/index.html.

Currently it has only been implemented for English, Chinese, and Danish.

Because I think this power to combine in different ways only becomes
crucial above the "word" level (defining that level by contrast), I
generally "list" only associations of words. Though I have done some
experiments for Chinese on recording associations at the character
level (at which point the "parser" becomes a word segmentation
algorithm.) So generally "examples" in my implementation are words and
lists of their associations. It is impossible to count the number of
"rules" or different orderings you might project out. In theory the
number is very high, as David Tuggy noted.

The basic insights are quite general to any language, though for
morphologically rich languages an implementation based on traditional
word boundaries would become less useful. There is no reason why you
could not search for structure in terms of groups of letters, but the
advantage of searching for patterns anew each time would decrease as
the morphology/phonotactics became less productive.

Chinese is a particularly interesting case to study because you can go
beneath "word" boundaries and find productive morphological structure
while still dealing with a relatively small number of "letters".

Listing "all the examples", at any time, in my implementation
corresponds to listing a corpus. I would never attempt to "harvest ...
all the rules". It would correspond broadly in this model to listing
all the sentences you could possibly say in a language.

I don't work in academia so there has been little incentive to
publish, but I did present a paper at a North American ACL some years
ago:

Freeman R. J., Example-based Complexity--Syntax and Semantics as the
Production of Ad-hoc Arrangements of Examples, Proceedings of the ANLP/NAACL
2000 Workshop on Syntactic and Semantic Complexity in Natural Language
Processing Systems, pp. 47-50. (http://acl.ldc.upenn.edu/W/W00/W00-0108.pdf)

This paper was deliberately vague on the details of the technical
implementation, but it presented the core complexity ideas.

Your book on "Cycles in Language" sounds interesting. How many
formalisms have you counted?

-Rob

On Wed, Jul 9, 2008 at 10:02 PM, A. Katz <amnfn at well.com> wrote:
> Okay, Rob. So you would like to stick with your topic.
>
> Do you have a formalism to deal with the more-rules-than-examples
> scenario?
>
> How do we count the examples and the rules? What are the more specific
> implications to any particular language? Have you already (or are you in
> the process of) applying this outlook to a single natural language in
> order to harvest all the examples and all the rules?
>
> If you have written any papers on this topic, would you care to share them
> with us?
>
> I am currently in the process of writing a book entitled CYCLES IN
> LANGUAGE. The topic is language change/evolution, and the main observation
> is that as much as language changes, it stays remarkably the same.
>
> In some of the beginning chapters, I and my co-author June Sun,
> discuss different formalisms for accounting for grammar, and we
> specifically discuss the concept of functional equivalence. We would be
> happy to include your outlook on more-examples-than-rules, if there are
> papers to cite.
>
>
> Best,
>
>    --Aya