The Nylonase Story: The Information Enigma

Editor’s note: Nylon is a modern synthetic product used in the manufacturing, most familiarly, of ladies’ stockings but also a range of other goods, from rope to parachutes to auto tires. Nylonase is a popular evolutionary icon, brandished by theistic evolutionist Dennis Venema among others. In a series of three posts, of which this is the third, Discovery Institute biologist Ann Gauger takes a closer look. Look here for the first and second posts.

Returning to the story of the nylonase gene and the problem of where new information comes from, I’d like to make the point that there is a reason that molecular geneticist and evolutionary biologist Susumu Ohno made his hypothesis about a frame-shift having produced nylonase. Ohno is famous for his hypothesis that gene duplication and recruitment are the chief means by which “new” proteins are made — he wrote a famous book about it.

But he also knew that copying and tinkering weren’t enough, that there had to be a way to generate genuine de novo information, brand new coding sequence for genuinely new proteins, in order to account for all the diversity of information that must have been necessary as life became more complex. New proteins had to come from somewhere.

Ohno had an idea. He thought coding sequences made up of oligomeric repeats might allow there to be several alternate ways to read the same sequence. For an explanation of alternate reading frames, see my earlier post, “The Nylonase Story: How Unusual Is That?”

As a potential example, Ohno proposed nylB, the gene for nylonase. This gene has certain characteristics that make it plausible that a frameshift could have occurred, characteristics I described in that second post in this series, such as nylB’s sequence being GC-rich and deficient in TAs. These two characteristics reduce the chances of having stop codons, in any frame.

Ohno thus proposed that nylonase arose after a frameshift mutation in a perhaps nonfunctional, prior-coding sequence, resulting in an entirely new coding sequence with nylonase activity. The only reason Ohno could make this proposal was because nylB, the gene that codes for nylonase, has at least two potential open reading frames in the forward direction — the hypothetical “original” one proposed by Ohno from before any hypothetical T insertion took place, and the actual one that codes for nylonase now.

Ohno published his paper in 1984. In 1992, Yomo et al. noticed that one frame in the antisense direction of nylonase has no stop codons either. It also lacks a start codon, though, so Yomo et al. called it a non-stop frame (NSF) instead of an open reading frame (ORF). The probability of finding a DNA sequence with an ORF on the sense strand and a full NSF on the antisense strand are small. But surprisingly, not only does nylB have an NSF on the antisense stand, nylB has another fully overlapping NSF in the forward direction. That’s two NSFs plus the actual ORF for nylonase (I’m not counting the hypothetical frame-shifted “original” ORF, since that frame actually has several intervening stops. (See “The Nylonase Story: When Facts and Imagination Collide.”) That means nylB has no stop codons in three out of six frames.

The chances of avoiding a stop codon in three out of six frames are very low. Our simulation (described in the previous post) showed that the probability is very small indeed. For an ORF 900 nucleotides long to have two NSFs at 70 percent GC is 9 out of 28,603 or .0003. (See there for details.) If these figures are recast to include the total number of random trials required to get an ORF of the proper length and GC content in the first place, and then with two NSFs, then the probability would be nine out of ten million trials, or 0.0000009. No organisms have ten million genes (we only have about twenty thousand), and Flavobacterium certainly doesn’t. But it’s not outside the realm of possibility that such sequences should exist by pure chance somewhere. After all, nylB does. But take the following into consideration.

In addition, beyond the first appearance of such a sequence, there would also need to be some way to prevent random mutation from introducing any stop codons over evolutionary time, in any of the three open frames. Purifying selection would normally be invoked in such a case. Organisms that develop harmful mutations in genes that encode functional gene products — things that are important for the organism’s survival — are less successful at reproducing, and so organisms carrying harmful mutations tend to disappear from the population (they are sickly or dead). However, purifying selection by definition has no effect on non-functional sequences. The fact that stops are prevented from accumulating in nylB NSFs implies that all three frames are functional. No function has been reported for the NSFs, however. They have no ATGs in the vicinity and so may be non-coding (though it must be acknowledged there are alternate start codons in the vicinity). In addition, it has been reported that the pOAD2 plasmid on which nylB is located is non-essential. It can be cleared from its host with no effect, except the loss of the ability to degrade nylon.

One possibility is that nylB has a secondary DNA or RNA-based function that requires its sequence to be nearly completely conserved. It would have to be a very specific sequence requirement to prevent the accumulation of stop codons in three frames, though. We get a hint that the cause is not sequence specificity, because the nylB and nylB′ genes of Flavobacterium differ by 47 amino acids, and the nylB gene of Pseudomonas has only about 35 percent identity according to reports, yet all three lack stops in the anti-sense frames in addition to their coding sequence (based on available sequence information).

Yomo et al., who first reported the anti-sense NSF in nylB, were amazed and puzzled by the existence of anti-sense NSFs in nylB genes of multiple species.

The probability of the presence of these NSFs on the antisense strand of a gene is very small (0.0001-0.0018) [we observed .0001]. In addition, another gene for nylon oligomer degradation [Pseudomonas nylB] was found to have a NSF on its antisense strand, and this gene is phylogenetically independent of the [Flavobacterium] nylB genes. Therefore, the presence of these NSFs is very rare and improbable. Even if the common ancestral gene of the nylB family was originally endowed with an NSF on its antisense strand, the probability of this original NSF persisting in one of its descendants of today is only 0.007. Unless an unknown force was maintaining the NSF, it would have quickly disappeared by random emergences of chain terminators. Therefore, the presence of such rare NSFs on all three antisense strands of the [three member] nylB gene family suggests that there is some special mechanism for protecting these NSFs from mutations that generate the stop codons. Such a mechanism may enable NSFs to evolve into new functional genes and hence seems to be a basic mechanism for the birth of new enzymes. [Emphasis added.]

Later on, they continue:

… the lifetime of a nonessential NSF is very short, and it is impossible for such a NSF to persist for a long period of evolution. Therefore, we strongly suggest that the existence of the NSFs on all the three antisense strands of the nylB gene family points to an unknown force that is preserving these nonessential NSFs; otherwise, they would have quickly disappeared by random emergences of chain terminators.

Ohno himself was aware of this work and in some sense supported it. He was the one who communicated it to the Proceedings of the National Academy of Sciences. What he made of it I don’t know.

The highlighted proposal in the above quotes is on the face of it antithetical to the materialist worldview. What kind of force can preserve apparently non-functional NSFs? Certainly a mechanism to preserve non-functional sequences so that they might some day evolve into functional genes is more suggestive of design than evolution. It would take a fair amount of foresight on the part of evolution, don’t you think, to develop a mechanism to prevent stop codons from interrupting non-functional NSFs, all for some possible future benefit?

All this speaks to the origin and preservation of potential information, information such as Ohno was looking for, but by a means different than he foresaw. We have returned full circle. Explaining nylonase does not require a frameshift, as I have shown in the first post — nonetheless nylonase’s gene is an unusual sequence. Getting overlapping code in three frames might happen in very rare circumstances, but keeping the NSFs open in the apparent absence of selection to maintain them would seem to be highly, highly unlikely. So we have extreme rarity piled upon rarity. Bear in mind also, that whatever the peculiar characteristics of the nylB gene sequence, it must also encode a functional, stably folded enzyme, which is another constraint.

Why am I going on and on about nylonase? It has to do with problem of the origin of novelty. Are frameshifts a possible source of new functional information? Might a sequence with alternate frames stay open by chance or be created by chance over evolutionary time? It’s a highly improbable event, but not impossible, I suppose. Might the alternate frames someday be material for frameshifted novel proteins, provided they stay open? They might theoretically be a reservoir for future proteins, but given what we know about the rarity of these kinds of sequences and the rarity of protein folds in sequence space, the possibility of generating an entire new protein fold from a frameshift is extremely, extremely, extremely low, and would depend on a highly unusual starting sequence tailored in advance for a particular functional specificity. In other words it would need to be designed.

In addition, even should such a sequence exist, it would not long persist in the face of neutral evolution. According to neo-Darwinism there is no magic molecular bouncer who throws out inactivating mutations before they can do their damage to a potential gene. Or to use another metaphor, evolution does not bank potentially useful sequences for future use. For it to do so would require foresight, an idea antithetical to evolutionary theory. Thus, any putative frame-shifted sequences that have been shown to have a functional role are better explained by design than by chance and necessity.

Should anyone disagree with my argument above, I’d like to point out that for a long time it was the standard belief among evolutionary biologists (and geneticists) that random sequence could not generate a functional protein. Frameshifted proteins are almost universally disrupted by stop codons (unless they happen to have an NSF or two like nylB). And even if they aren’t interrupted, the new sequence will be unlikely to fold into a stable protein, given the rarity of functional folds in sequence space (see the first post).

As an aside, as one of the curious facts of history, the disruptive properties of frameshift mutations were used to discover the triplet nature of the genetic code. Says Sir F.H.C. Crick in a lecture on the genetic code he gave in 1964:

This [the ability to combine mutations] has enabled us to tackle the question: is it really a group of three that makes up a codon? The basic idea is the following. We are able to pick up mutants which we believe (from the way they behave in various contexts) are not merely the change of one base into another, but are either the addition or a deletion of a base or bases. What happens when you have a genetic message and you put in an extra base? The reading starts from the beginning until it comes to that point and from there onward the whole of the message is read incorrectly, because it is being read out of phase. In fact we find that these [frameshift] mutants are completely inactive — this is one of our bits of evidence that they are what we say they are. You can pick up a number of such mutants and can put them together, by genetic methods, into the same gene. For example you can put together two of them. Such a gene would be read correctly until it reached the first addition, and then it would be out of phase. When the reading came to the second addition it would [be] read out of phase again, and so the whole of the rest of the message would be read incorrectly. Now it so happens that the left-hand end of this gene is not terribly important for its function. We can actually delete it and the gene will work after a fashion. In this region we have constructed, by genetic methods, a triple mutant, using three mutants all of the same type, and we have found that the gene will nevertheless function fairly normally.

This result is really very striking. Each of the three different faults, used singly, will knock out the gene. You can put them together in pairs in any combination you like, but then the gene is still quite inactive. Put all three in the same gene and the function comes back. We have been able to do this with a number of distinct combinations of three mutants (Crick et al., 1961).

Crick and others found that when three single base frameshift mutations of a particular gene, each completely disruptive on its own, were combined into the same gene, the three insertions together restored the frame enough for the protein to function again! Hence the code must be based on threes.

The sheer improbability of getting a functional enzyme from frameshifted random sequence has been the accepted view for a long time. It is only recently, in the era of big genomic data, that it has begun to be accepted that new proteins do occasionally arise by frame-shift mutation. The reason? It’s because we find examples in the genome that appear to be products of such events, based on sequence comparisons.

The proteins apparently affected by such frameshifts in the genome are often transcription factors or membrane proteins involved in gene regulation. The apparent frameshift often affects alternative splicing and changes the coding sequence over an exon or so; alternatively, the frameshift affects the end of the protein, resulting in truncation. The fact that such a mutation is located near the protein’s end reduces the amount of disruption to the protein. Many such mutations have been documented to cause disease, however. For a demonstration, just use Google Scholar to search for “frameshift.”

At this point, the chief question that should be in everyone’s mind is, “Can evolution by neo-Darwinian means produce new functional information from frame-shifted sequence? Or are other explanations more likely?”

It boils down to this. Do we say that frameshifted functional proteins are easy to generate, because after all, they exist? Or do we acknowledge that such proteins are not easy to generate and so may be evidence for design?

To reiterate, it used to be standard knowledge that frameshift mutations were always bad. Disruptive. So, for example:

More radical mutational events, such as insertions and deletions that change the reading frame — frameshift mutations — are generally considered to be detrimental (e.g. by causing nonfunctional transcripts and/or proteins, through premature stop codons) and of little evolutionary importance, because they seriously alter the sequence and structure of the protein.

But now it has become popular to offer frameshifts as a quick way to get novelty. I am pretty sure it all began with Ohno, who said:

It has recently occurred to me that the gene started from oligomeric repeats at its certain stage of degeneracy (base sequence diversification) [nylB] can specify a truly unique protein from its alternative open reading frame.

Now the meme has spread. From the Abstract of a paper documenting the “Frequent appearance of novel protein-coding sequences by frameshift translation,” we hear that “Major novelties can potentially be introduced by frameshift mutations and this idea can explain the creation of novel proteins.” And how do they defend the possibility of a functional frameshift? “Some cases of recent evolution of new genes via frameshift have been reported. For example, in bacteria the sudden birth of an enzyme that degrades manmade nylon oligomers was explained by a frameshift translation of a preexisting coding sequence.”

Sigh. The record needs to be corrected. (See my first post, “The Nylonase Story: Where Fact and Imagination Collide.”)

Let us close by considering the nature of the argument being made concerning proposed frameshifts. The fact concerning such proposed frameshifts is that there are sequence similarities between two stretches of DNA, where one part appears to be frameshifted with respect to the other.

Notice that the argument used to explain the appearance of novel genes by frameshift uses a form of inference known as abduction, where one reasons from present effects to past causes.

The surprising fact A is observed.
If B were true, then A would be a matter of course.
Hence, there is reason to suspect that B is true.¹

In other words:

The surprising fact of novel genes apparently arising by frameshift is observed.
If it is easy to get new functions from random sequence, then it is a matter of course that frameshifts can produce functional proteins.
Hence it is easy to get new functional proteins from random sequences

Abductive arguments are very weak. The problem is that there can be multiple competing causes that explain the observed effects. The only way to strengthen the argument is to rule out all other competing causes. And design is a particularly strong competing hypothesis. We know design is a cause capable of producing the effect in question, namely the generation of new functional proteins by the addition of frame-shifted code. In fact, given what we know about the rarity of functional proteins in sequence space, as demonstrated experimentally here, here, and here, and theoretically here, design is a better explanation than the neo-Darwinian one.

Until someone demonstrates experimentally, in real time, that a frameshift mutation can generate a new functional protein (not just a loss of function) by undirected processes, the inference that it is easy to do so is unjustified. And nylonase is not that demonstration.²

References:

(1) Stephen C. Meyer, Of Clues and Causes: A Methodological Interpretation of Origin of Life Studies. PhD dissertation (Cambridge: Cambridge University, 1990).
Charles S. Peirce, “Abduction and Induction,” In The Philosophy of Peirce, edited by J. Buchler (London: Routledge, 1956), 150–154. Charles S. Peirce, Collected Papers, edited by Charles Hartshorne and P. Weiss. 6 vols. (Cambridge, MA: Harvard University Press, 1931–1935).

(2) In a future post, I will discuss experiments that attempt to demonstrate that random sequence can perform simple functions.

Evolution News_{& Science Today}

Evolution

Intelligent Design

The Nylonase Story: The Information Enigma