12.438, Disc: Parallelism Between Lang & Genome

The LINGUIST Network linguist at linguistlist.org
Fri Feb 16 23:13:23 UTC 2001


LINGUIST List:  Vol-12-438. Fri Feb 16 2001. ISSN: 1068-4875.

Subject: 12.438, Disc: Parallelism Between Lang & Genome

Moderators: Anthony Aristar, Wayne State U.<aristar at linguistlist.org>
            Helen Dry, Eastern Michigan U. <hdry at linguistlist.org>
            Andrew Carnie, U. of Arizona <carnie at linguistlist.org>

Reviews (reviews at linguistlist.org):
	Simin Karimi, U. of Arizona
	Terence Langendoen, U. of Arizona

Editors (linguist at linguistlist.org):
	Karen Milligan, WSU 		Naomi Ogasawara, EMU
	Lydia Grebenyova, EMU		Jody Huellmantel, WSU
	James Yuells, WSU		Michael Appleby, EMU
	Marie Klopfenstein, WSU		Ljuba Veselinova, Stockholm U.

Software: John Remmers, E. Michigan U. <remmers at emunix.emich.edu>
          Gayathri Sriram, E. Michigan U. <gayatri at linguistlist.org>

Home Page:  http://linguistlist.org/

The LINGUIST List is funded by Eastern Michigan University, Wayne
State University, and donations from subscribers and publishers.

Editor for this issue: Karen Milligan <karen at linguistlist.org>

=================================Directory=================================

1)
Date:  Thu, 15 Feb 2001 17:41:12 EST
From:  Zylogy at aol.com
Subject:  Re: 12.396, Disc: Parallelism Between Lang & Genome

-------------------------------- Message 1 -------------------------------

Date:  Thu, 15 Feb 2001 17:41:12 EST
From:  Zylogy at aol.com
Subject:  Re: 12.396, Disc: Parallelism Between Lang & Genome





One of the unfortunate miscorrelations in identification of parallelisms has
been the coining of the idea of the DNA code "word". In actuality, this is
better looked at as a "phoneme" analogue. You therefore get 64 possible
"genemes" There are languages with this many phonemes (though most have
fewer). 20 or so actually encoded amino acids would correspond to 20
"genemes". Lots of languages with @20 phonemes.

All the shape, charge, size, solubility properties of the amino acid side
chains, should one utilize the maximally expanded idealized code (64
different members) could be handled by extra "distinctive features". The
internal structure of the representational cube diagram would then represent
instantiation of combinatorics of these features, within the constraints of
the 3 "letter" code "word".

Given the expense of maintainance and copying of nucleic acids, one wonders
whether the code, far from being a sloppy expansion of some two-letter
original (as has been proposed, where the third letter is "wobbly", leading
to redundancy and degeneracy of the code), might be better thought of as a
more resource-economically efficient reduction of a proto-code with a higher
number of letters. Such a code would be able to capture finer details, but
its physical carrier would be much bulkier- I think that there is a hint here
about language structure and evolution too, if the signal inversion
hypothesis is correct.

The next major structural level within actual genes coding for proteins is
that corresponding to the "domain"- various geometrically defined structural
motifs found in protein structure- sheets, rods, folds, etc. This reminds me
a great deal of the Bolingerian "phonaestheme"- recurrent partials which
often have vaguely or strongly sensed meaning but which are smaller than the
synchronic "root" of a word. Diachronically, such partials, at least in
language, correspond to ancestral roots and root/affix combinations. If the
genetic analogy is ok, then perhaps such domain structure also represented
standard compositional elements utilized to build up larger structures (and
in fact some leading geneticists have speculated that this is indeed the
case, and eucaryotic intervening sequences exist to aid such recombination).

Higher up, full genes would represent complete stems by analogy, with various
secondary modifications and transcription instructions tacked alongside
and/or within as "inflections".  Rhythmic/metrical structure is also evident
in both DNA and their protein products, both at the level of gross mass
distribution and actual vibratory behavior. Repeated "junk" sequences, skews
even within genes away from even averaged mixes of the four code letters, all
sorts of things. Rates of transcription, tensions on the actual molecular
chains.

We don't know yet definitively whether there is any overarching ground plan
within the genome structurally definable as "the book of life" we hear so
much about. At a lower level, though, there are definite hints.  In bacteria,
genes are often in serial order (just as the case with serial verbs in
language)- not only are the protein products physically produced one after
the other (keeping their relative numbers in a one to one to one, etc.
ratio), but activity-wise the chemical product of one is the raw material of
the next in line etc.  And there is often physical chaining, keeping each
"reactor" attached to the next one. Major efficiency.

In eucaryotes, such as us, gene families, such as the hemoglobins, or
immunoglobins, are found "together". In the former, order of genes in the
string may reflect not only the order of phylogenetic copying, but also the
order of activation ontogenetically.  And the genes which control the
development of the underlying segmental body plan in all higher organisms
(from worms to humans), keep the head-to-tail arrangement tightly linearized.
So we have mapping on the genome to both mostly temporal (hemoglobins) and
mostly spatial (body plan) effects.

Linguistic texts also have such ordering within them, at various hierarchical
levels- luckily I don't have to expound upon that here.

So the question is, just how far does it go, this parallelism? At a deeper
level, what can the structural dynamicity of both genetic and linguistic
systems tell us about the origins and histories of both. Can understanding of
the one enrich that of the other?

It seems clear, for instance, that the rise of complex morphological
structure makes the necessity of a really large base lexicon less necessary-
indeed the most polysynthetic structures contain the fewest numbers of  base
morphemes in any language. Could analogues to such hierarchy within the
genome have been the impetus leading to virus structure, with overlapping
genes? Could there be, over vastly longer stretches of time, something like a
"typological cycle" for living organisms at the level of the genome? There
are already known higher organisms and even unicellular species, all living
parasitically, moving towards large scale jettisoning of their own genes. And
higher viruses seem to be accumulating genes. Where does it end?

As to origins, I've speculated that a nonsyntactic precursor capable of
producing very large numbers of temporally short, maximally featured signals
gave way, through inversion of signal structure, to language as we know it.
Could a similar process have occurred in the origins of the genome? In this
scenario, instead of genes-as-we-know -them, all nicely strung together in
ever larger functional units, we would have the analogue of
high-feature-number lexical matrix- all possible combinations. So we would
have an ideophone-like continuum. Such a multidimensional continuum possibly
would have to be constructed multidimensionally, so instead of a nice long
linear sequence we have a literal matrix of short strings.

Jess Tauber
zylogy at aol.com



The appositional structure of bacteria poses a similar problem of
bottom-up/top-down developmental perspective: some scientists believe it
represents the original form, with eucaryotic split genes (us) representing
an innovative advance, others believing that it is a streamlining of form
starting with split genes and seeing the editing of intervening noncoding
sequences out.



---------------------------------------------------------------------------
LINGUIST List: Vol-12-438



More information about the LINGUIST mailing list