12.449, Disc: Parallelism Between Lang and Genome

The LINGUIST Network linguist at linguistlist.org
Mon Feb 19 02:02:42 UTC 2001


LINGUIST List:  Vol-12-449. Sun Feb 18 2001. ISSN: 1068-4875.

Subject: 12.449, Disc: Parallelism Between Lang and Genome

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Lydia Grebenyova, EMU		Jody Huellmantel, WSU
	James Yuells, WSU		Michael Appleby, EMU
	Marie Klopfenstein, WSU		Ljuba Veselinova, Stockholm U.

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Karen Milligan <karen at linguistlist.org>

=================================Directory=================================

1)
Date:  Sat, 17 Feb 2001 11:33:14 -0600
From:  Jerry Packard <j-packard at uiuc.edu>
Subject:  Re: 12.379, Disc: New: Parallelism Between Lang & Genome

2)
Date:  Sun, 18 Feb 2001 13:03:25 -0800 (PST)
From:  Gabriele Scheler <scheler at ICSI.Berkeley.EDU>
Subject:  RE: Linguistics and Genomics

-------------------------------- Message 1 -------------------------------

Date:  Sat, 17 Feb 2001 11:33:14 -0600
From:  Jerry Packard <j-packard at uiuc.edu>
Subject:  Re: 12.379, Disc: New: Parallelism Between Lang & Genome

It is very difficult for me to believe that this sort of double
articulation ('sequential arrangement of discrete subunits...which by
themselves are devoid of inherent meaning but serve to build minimal units
endowed with their own, intrinsic meaning') occurs only in the genetic and
linguistic codes. A glance at nature reveals that double articulation
occurs everywhere we look---so for example, in many reductionist systems it
would appear to be the case that the intrinsic properties of lower-order
building blocks of higher-order elements do not end up being directly
represented at the higher-order level. So for example (and I am not a
physicist, so please forgive my ham-handedness) the weight or the spin
(properties) of the smaller particles that make up, e.g., protons, do not
end up being properties of the proton (i.e., the proton does not have a
corollary weight or spin), except indirectly via the compositional code.

By implication, the parallel double articulation in the genetic and
linguistic codes would seem to be coincidental rather than deterministic.
It takes a great leap of faith (though it makes for great science fiction)
for me to believe that this sort of double articulation parallelism between
genetic and linguistic codes is more than just a general property of all
hierarchical reductionist systems.

More intuitively speaking, double articulation at the genetic code level
seems too distant from double articulation at the linguistic level (even if
metaphorically analogous) to be able to matter. To make the claim that it
does matter would seem to be in a sense like saying that the properties of
Verb Phrases have analogies in the chemical actions that take place at the
synapses in neurons in Broca's or Wernicke's area---it might be possible to
demonstrate analogous, parallel properties, but it doesn't seem possible
that such properties would be deterministically related.

Jerry Packard, University of Illinois

>Somehow there seems to be a kind of all-or-none prejudice when there is at
>least interest in the topic. It never occurs to most that the hierarchical
>layering itself may be partial explanation for the emergence of
>"arbitrariness" in either domain- the shifting of part/whole ranking which
>allows internal structures to be less than slavishly preserved so long as
the
>higher level interactions still work. Once you're bootstrapped, your in. But
>you still need to get there in the first place. Think of the construction of
>an arch. Lots easier to build if you first emplace a form beneath it.
>
>As for the ultimate origins of both codes, it seems reasonable to ask
whether
>we might want to look at "social maintainance" at both levels. The "RNA
>World" scenarios just don't make sense- the whole arch thing again. Some
>dynamic, loosely integrated system of polymers, membranes, etc. must already
>have been in existance, and the actual chemical makeup of some of them would
>help assort them in the rough and tumble of the mix. Link things tightly
>enough and you have the beginnings of a code with all the other trimmings.
>Similarly, the social maintainance managed by vertebrate call systems seems
>like the likely place to look for the origins of language- I made an
>introductory case for "signal inversion" from such call systems a couple of
>weeks ago on LINGUIST.


-------------------------------- Message 2 -------------------------------

Date:  Sun, 18 Feb 2001 13:03:25 -0800 (PST)
From:  Gabriele Scheler <scheler at ICSI.Berkeley.EDU>
Subject:  RE: Linguistics and Genomics

Re: 12.438: Parallelism between language and genome

Jess Tauber writes:
Zylogy at aol.com

> overlapping genes in certain viruses versus polysynthetic structure,
> gene apposition versus agglutination/serialization,

This is also an indicator of evolution, where we find that simpler organisms
have more tightly packed structures, and more complex organisms - all the
way to humans, which do emerge as the most complex species analyzed yet
on the parameter of gene density (plants however, may be even more complex) -
have spread-out protein codes in space, including introns.

There may be a parallel here with respect to language evolution, if we put
English and Chinese at the tip of the pinnacle vs. polysynthetic
(portmanteau-like, L. Carrol) languages (but see below, typological cycles).

Just which information-processing strategy is at work here with respect to
both genetic and language evolution?


> split genes versus syntactic combination.

The usual idea about split genes is much simpler than syntactic combination.
The idea is that a single sweep with a left-to-right parser would do the
trick, and introns signal inclusion or exclusion of the next coding string.


Jess Tauber( zylogy at aol.com):

>Critical comment on Jakobson:  Jakobson seems to feel that at root both codes
>are "arbitrary", yet evidence has been accumulating in both fields for
>motivation behind the codes. In a paper I have buried somewhere (published in
>Science or Nature @20 years ago during a time when I still had dreams of
>becoming a molecular biologist) the authors noted the hydophobic/hydrophilic
>(water-hating/loving) qualities of isolated analogues of the molecular side
>chains (the business ends) of coded amino acids, and showed that on this
>basis of the position of each parent amino acid within the 64-cubie code
>representation was far from randomly sorted, even after accounting for the
>degeneracy of the code leading to multiple cells representing the same amino
>acid. And several years later playing with the organization of the axes of
>the representation I was able to show that the size, shape, and charge of the
>amino acid side chains, as well as on/off signals, were symmetrically
>distributed.
>
>On the linguistic side, phonosemantic coding takes advantage of symmetries
>hidden within the phonological system of the language.
>
>Jakobson himself was certainly a defender of phonosemantics- a major section
>of the same book is given over to it- but he was writing at a time when there
>was still no sense of coherent structural motivation underlying the iconicity
>present in either the biological or linguistic codes. Similarly within the
>molecular biology community (even the genomicists) there has been little
>evidence for any drive to find motivation in the ultimate constituents.

There is an issue of motivation from the "hardware", or implementation, and
of examining properties of a code by virtue of known material properties
(amino acids, acoustic and articulatory phonetics) of its realization.
Note however that with the issue of phonosemantic coding, we have a case of
motivation of a code by virtue of its implementation, or embedding within
another abstract code - which is considerably less messy than having to
access properties of "raw reality".

>Somehow there seems to be a kind of all-or-none prejudice when there is at
>least interest in the topic. It never occurs to most that the hierarchical
>layering itself may be partial explanation for the emergence of
>"arbitrariness" in either domain- the shifting of part/whole ranking which
>allows internal structures to be less than slavishly preserved so long as the
>higher level interactions still work. Once you're bootstrapped, your in. But
>you still need to get there in the first place. Think of the construction of
>an arch. Lots easier to build if you first emplace a form beneath it.
>
This is the idea that once you have a multilayered embedded system established,
aka a "symbolic system", then the system itself becomes a factor in
restructuring the material basis, e.g. sign languages. For genomics this
may apply to the idea of a 'coding system' for proteins, additional
'instruction sets', and backup material for the 'repair system'. Simple
organisms like viruses may have these different symbolic functions all rolled
up in one thing "build and edit proteins".

Jess Tauber:

>One of the unfortunate miscorrelations in identification of parallelisms has
>been the coining of the idea of the DNA code "word". In actuality, this is
>better looked at as a "phoneme" analogue. You therefore get 64 possible
>"genemes" There are languages with this many phonemes (though most have
>fewer). 20 or so actually encoded amino acids would correspond to 20
>"genemes". Lots of languages with @20 phonemes.
>
>All the shape, charge, size, solubility properties of the amino acid side
>chains, should one utilize the maximally expanded idealized code (64
>different members) could be handled by extra "distinctive features". The
>internal structure of the representational cube diagram would then represent
>instantiation of combinatorics of these features, within the constraints of
>the 3 "letter" code "word".
>
>Given the expense of maintainance and copying of nucleic acids, one wonders
>whether the code, far from being a sloppy expansion of some two-letter
>original (as has been proposed, where the third letter is "wobbly", leading
>to redundancy and degeneracy of the code), might be better thought of as a
>more resource-economically efficient reduction of a proto-code with a higher
>number of letters. Such a code would be able to capture finer details, but
>its physical carrier would be much bulkier- I think that there is a hint here
>about language structure and evolution too, if the signal inversion
>hypothesis is correct.


This may link up to the idea of an information processing evolutionary trend
as chunking, "digitalization", or creation of simpler units with less meaning,
and more possibilities of combination.
This may have happened in the development of languages, and it certainly
happened in the evolution of alternate splices of proteins.

>The next major structural level within actual genes coding for proteins is
>that corresponding to the "domain"- various geometrically defined structural
>motifs found in protein structure- sheets, rods, folds, etc. This reminds me
>a great deal of the Bolingerian "phonaestheme"- recurrent partials which
>often have vaguely or strongly sensed meaning but which are smaller than the
>synchronic "root" of a word. Diachronically, such partials, at least in
>language, correspond to ancestral roots and root/affix combinations. If the
>genetic analogy is ok, then perhaps such domain structure also represented
>standard compositional elements utilized to build up larger structures (and
>in fact some leading geneticists have speculated that this is indeed the
>case, and eucaryotic intervening sequences exist to aid such recombination).
>
>Higher up, full genes would represent complete stems by analogy, with various
>secondary modifications and transcription instructions tacked alongside
>and/or within as "inflections".  Rhythmic/metrical structure is also evident
>in both DNA and their protein products, both at the level of gross mass
>distribution and actual vibratory behavior. Repeated "junk" sequences, skews
>even within genes away from even averaged mixes of the four code letters, all
>sorts of things. Rates of transcription, tensions on the actual molecular
>chains.
>
>We don't know yet definitively whether there is any overarching ground plan
>within the genome structurally definable as "the book of life" we hear so
>much about. At a lower level, though, there are definite hints.  In bacteria,
>genes are often in serial order (just as the case with serial verbs in
>language)- not only are the protein products physically produced one after
>the other (keeping their relative numbers in a one to one to one, etc.
>ratio), but activity-wise the chemical product of one is the raw material of
>the next in line etc.  And there is often physical chaining, keeping each
>"reactor" attached to the next one. Major efficiency.
>
>In eucaryotes, such as us, gene families, such as the hemoglobins, or
>immunoglobins, are found "together". In the former, order of genes in the
>string may reflect not only the order of phylogenetic copying, but also the
>order of activation ontogenetically.  And the genes which control the
>development of the underlying segmental body plan in all higher organisms
>(from worms to humans), keep the head-to-tail arrangement tightly linearized.
>So we have mapping on the genome to both mostly temporal (hemoglobins) and
>mostly spatial (body plan) effects.
>
>Linguistic texts also have such ordering within them, at various hierarchical
>levels- luckily I don't have to expound upon that here.

Yes, but the whole issue of gene expression and gene networks is that genes
can be activated one at a time and independent of each other. This may be
another product of evolution, an increase in independence of protein production.

>
>So the question is, just how far does it go, this parallelism? At a deeper
>level, what can the structural dynamicity of both genetic and linguistic
>systems tell us about the origins and histories of both. Can understanding of
>the one enrich that of the other?
>
>It seems clear, for instance, that the rise of complex morphological
>structure makes the necessity of a really large base lexicon less necessary-
>indeed the most polysynthetic structures contain the fewest numbers of  base
>morphemes in any language. Could analogues to such hierarchy within the
>genome have been the impetus leading to virus structure, with overlapping
>genes? Could there be, over vastly longer stretches of time, something like a
>"typological cycle" for living organisms at the level of the genome? There
>are already known higher organisms and even unicellular species, all living
>parasitically, moving towards large scale jettisoning of their own genes. And
>higher viruses seem to be accumulating genes. Where does it end?

The idea of a typological cycle instead of a linearly evolving system is
certainly meritorious, and strangely absent from Darwinistic biological
theory. I think this suggestion should be carefully evaluated, since we
are not clear about this in language development either, or are we?

Is it possible that certain types of bottlenecks in evolution eliminate
alternatives in development, such that the impression of a linear
evolution arises (albeit with branching, as in the famous "tree" image),
while less bottleneck would allow a system to oscillate freely between
a number of optimal states?

If we are not obsessed with genetic dependants (offspring), we may want
to create a map of known species, living and extinct, and derive a number
of different concentric clusters as a source of classification. In terms
of DNA, humans may be more similar with their favorite bacteria than worms,
or than worms are with their viruses.

>As to origins, I've speculated that a nonsyntactic precursor capable of
>producing very large numbers of temporally short, maximally featured signals
>gave way, through inversion of signal structure, to language as we know it.
>Could a similar process have occurred in the origins of the genome? In this
>scenario, instead of genes-as-we-know -them, all nicely strung together in
>ever larger functional units, we would have the analogue of
>high-feature-number lexical matrix- all possible combinations. So we would
>have an ideophone-like continuum. Such a multidimensional continuum possibly
>would have to be constructed multidimensionally, so instead of a nice long
>linear sequence we have a literal matrix of short strings.
As above, this seems to tally with the differences in gene density from
viruses, through bacteria, worms, vertebrates, humans.


Jess Tauber:


>The appositional structure of bacteria poses a similar problem of
>bottom-up/top-down developmental perspective: some scientists believe it
>represents the original form, with eucaryotic split genes (us) representing
>an innovative advance, others believing that it is a streamlining of form
>starting with split genes and seeing the editing of intervening noncoding
>sequences out.

In terms of the immense increase of "junk DNA" as a hallmark as evolution,
the "split as innovation" thesis seems more likely to me.

I have a set of suggestions concerning "junk DNA" which I would like to
post separately.

Gabriele Scheler


scheler at icsi.berkeley.edu

---------------------------------------------------------------------------
LINGUIST List: Vol-12-449



More information about the LINGUIST mailing list