Proteome Is Analogous to Language

In coverage of evolution in these pages, Eugene Koonin comes up often. He does so because of his appreciation for the complexity of life. That appreciation tends to drive him toward the outmost absurd limits of evolution, but that’s fine since it is so revealing. The last time Evolution News mentioned his thinking was in reference to his book The Logic of Chance, in which he resorted to the multiverse to overcome the improbability of life’s origin.

Protein “Grammar”

Now in a paper in PNAS co-authored by Koonin, his colleague Lijia Yu and four others find “grammar” in the set of protein domains used by life. In so doing, they recognize that the analogy vastly restricts the set of possible “words” in random letters.

From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. [Emphasis added.]

The phrase “in evolution” here is superfluous, adding nothing to the analogy. The authors assert that there is a connection:

Genomes show remarkable similarities to natural languages. Like all cellular life forms, all natural languages are believed to have descended from a single ancestor and have evolved through mechanisms comparable to biological evolution.

Yet their 2008 reference to support this assertion will not be appreciated by most evolutionists, because it claims that “languages evolve in punctuational bursts” in contrast to Darwinian gradualism. Perhaps new protein domains are hopeful monsters. Now, what about the words in the analogy?

In a written language, individual letters cannot carry semantic information; the smallest unit of information, therefore, is a word. Protein domains are structural, functional, and evolutionary units of proteins and are thus analogous to words. This analogy is reflected in the statistical properties of the domain repertoires of diverse organisms.

Semantic information is a key concept in intelligent design, because the sequence of nucleotides is the carrier of information. The sequence space for letters vastly outstrips the functional space of words. In the case of human language, the meanings can be arbitrarily assigned. With proteins, the sequences of amino acids must physically function. Forces of attraction and repulsion between the amino acids must lead to folded shapes that carry out the work required. The codons in genes, however, do appear arbitrary, in the sense that ACG does not “mean” threonine, nor does UAG “mean” stop. There must be a language convention that relates the two codes. This convention is provided by the family of aminoacyl tRNA synthetases, which affix the correct amino acid on one end of transfer RNA to match the corresponding anticodon on the other end.

A Decrease in Entropy

Getting back to the word analogy, Koonin’s team recognizes that functional words represent a decrease in entropy, just as they do in languages.

We show that the loss of entropy (information gain) resulting from domain arrangements in genomes is nearly constant across the entire course of cellular life evolution and identify both similarities and dissimilarities between the “language” of proteins and natural languages.

Using linguistic techniques, they come up with a value of 1.1 to 1.3 bits as the constant information gain throughout the evolution of life, and 3.6 bits as the constant information gain in languages. How they came up with these values need not concern us now; the important point is that both words and protein domains exhibit increases in information (decreases in entropy) characteristic of languages. The lower value for proteins may be due only to the fact that proteomes contain more “one-word sentences” than most languages. The similarities, though, are indeed “remarkable,” as they say.

The Analogy Breaks Down

Throughout the paper, the team merely assumes that functional proteins arose by natural selection. The analogy breaks down with language. Human languages may “evolve” — but not by natural selection, because decision-making minds are involved, choosing how best to express thoughts. The team blurs this important distinction:

This trend of increased multidomain protein formation with increasing organismal complexity is known as domain accretion and apparently plays a major role in evolution, particularly in major evolutionary transitions such as the origin of multicellularity. Of the numerous possible domain combinations, only a limited subset is actually represented in genomes, suggesting that domain architectures are shaped by natural selection.

Terms of Bias

“Apparently” and “suggesting” are terms of bias. Nothing in their data requires increasing complexity over time or common ancestry. Eukaryotes have more protein domains. Does that require evolution from primitive cells to complex animals? It could, but it could just as well represent reality independent of time: not an unfolding scroll, but a map; not a tree, but a lawn. The disparate proteomes across the living world exist simultaneously to our senses now. Nobody saw them emerge over time. At least, development over time is not necessary to the analogy.

Given that distinction, the operative sentence in the passage quoted above could be reduced to, “Of the numerous possible domain combinations, only a limited subset is actually represented in genomes.” Period. What’s evolution got to do with it? In the orchard analogy Jonathan Wells uses to picture the sudden emergence of phyla in the Cambrian explosion, there is some branching at the tips of each plant over time, but the plants are discontinuous. Ignoring the assumption of evolution, Koonin’s team agrees that functional space is a tiny fraction of sequence space. This is true both for languages and proteomes. The vast majority of sequences are gibberish. The orchard picture allows for alternate spellings in words over time, and substituted amino acids in proteins over time. One protein domain cannot, however, tolerate unlimited substitutions and retain its function. Nor will it evolve into a neighboring plant.

Taking Refuge in the Multiverse

In his new online course on intelligent design, Michael Behe illustrates the tight restrictions on functional space. Many proteins interact, he explains in lecture 21, by contacting each other at specific sites on their surfaces. He shows that the probability of a new contact requiring two mutations is so low (on the order of one in 10 to the 20th power), that it could not be expected to appear in all the cells that have ever existed on Earth over the entire age of the universe. Before Koonin takes refuge in the multiverse again, he should deal with these low probabilities, and not leap over them with phrases like “the origin of multicellularity” or “the origin of eukaryotes” that are taken for granted in the paper.

Having eliminated the dependency of this paper on Darwinian evolution, you can now see that the paper’s comparison to language supports intelligent design. The authors use synonymous terms for irreducible complexity:

Conceivably, the near-universal value of information gain by the genome-wide domain architectures represents the minimum complexity that is required to maintain a functioning cell capable of adequately processing internal and external signals.

They agree with ID’s emphasis on semantic information:

The near-universal information gain relates the protein languages of biology to human natural languages.

And they recognize that function relies on specificity:

The function of a protein, to a large extent, is determined by the arrangement of its constituent domains — that is, its domain architecture.

When purged of irrelevant Darwinian assumptions, this paper underscores several key principles of intelligent design theory. And it doesn’t require the absurd implications (e.g., Boltzmann brains) of a pseudo-scientific, unobservable multiverse.

Photo: A sample of meaningful text, by Annie Spratt via Unsplash.

Evolution News_{& Science Today}

Intelligent Design