About Orphan Genes — What’s the Big Problem for Evolution?

Orphan genes — genes that are present in only one species, or a group of closely related species — are of particular interest to advocates of intelligent design. The reason for this has to do with the assumptions of evolutionary biology.

The main evolutionary assumption is common descent, that all life is descended from one or a few ancestors. Following from this, and taken as evidence for this, is the assumption that all life shares DNA in common. Prior to the advent of widespread genome sequencing, it was assumed that living things shared genes, that there was a set of shared housekeeping genes, and a set of genes specific to a taxonomic group, though these would be few in number. It was assumed that the vast majority of genes would be found multiple places in the genomes of living things. The reason? It was assumed that getting new genes was hard, and once a workable solution was found it would be preserved in the descendants that followed. The bulk of genes would have been invented early in evolution, and thus would be broadly shared.

When It All Changed

But all that changed when many genomes were sequenced and their transcripts analyzed. Each genome, or each taxonomic group, such as bivalves or insects, was found to contain unique genes, found only in that group or species. This was a surprise. At first it was attributed to incomplete sampling. As more genomes were sequenced, it was thought, the uniqueness would turn out to be illusory. Other organisms would carry those genes. As a related explanation, the sparsity of their distribution might be due to horizontal gene transfer, or to gene loss. The hypothesis was that what appeared to be unique was so because it was the result of some rare transfer between species, and we hadn’t identified the source. Or what once was widespread had been lost over evolutionary time.

These explanations are not proving true. First, the more genomes that are sequenced, the more the proportion of orphans should shrink, as more and more “orphans” should be shown to be present in other genomes. But that has not proven to be the case. The mountain of orphan genes is growing, not shrinking. Similarly, horizontal gene transfer was not born out. The sister genes of orphans should have been found as sample size increased, reducing the proportion of orphan genes. As for gene loss as an explanation, it would have to be too massive to be realistic to account for the patterns seen.

One last possibility. The orphans could be related to other genes, but their sequences could have diverged so much as to be unrecognizable. Only their protein structures might reveal relatedness. This also has not been born out by studies that have determined structures of orphan proteins.

A Sea Change in Evolutionary Thinking

So what’s the solution? If you are an evolutionary biologist, it’s simple. You decide it must be easy to get new genes directly from random (non-coding) DNA, or by frameshift or overlapping genes (which amounts to random sequence). This represents a sea change in evolutionary thinking.

Now hold it. Saying that it’s easy to get new genes from DNA by those methods overturns a major Darwinian expectation. In 1977, in his famous article “Evolution and Tinkering,” which has been cited many thousands of times, the Nobel laureate François Jacob explained the accepted view of how evolution constructed new genes:

…once life had started in the form of some primitive self-reproducing organism, further evolution had to proceed through alterations of already existing compounds. New functions developed as new proteins appeared. But these were merely variations on previous themes. A sequence of a thousand nucleotides codes for a medium-sized protein. The probability that a functional protein would appear de novo by random association of amino acids is practically zero. In organisms as complex and integrated as those that were already living a long time ago, creation of entirely new nucleotide sequences could not be of any importance in the production of new information. [p. 1164; emphasis added.]

New genes must arise from pre-existing genes, leaving the signal of ancestry in their closely related (i.e., homologous) sequences, because the probability of the alternative is “practically zero.” That’s why the discovery of orphan genes, which show no homology to other sequences, came as a great surprise.

No Problem, You Say?

“No problem. Isn’t that what science supposed to be about?” said one evolutionist to me. “Adapting your theory to fit the facts?”

Well, theories have to be amenable to falsification too. They can only bend so far.

So how can we tell whether genes are easy to get or hard? By testing these alternatives in the lab.

At present the preferred theory for the birth of new genes is to take a stretch of DNA that is currently not being transcribed into RNA, then let it acquire the signals necessary for transcription, then have that new transcript have a function, either as an RNA or after being translated into protein.

This is in fact how many orphan genes are found. An RNA transcript is made in one species from a stretch of DNA that in a sister species does not make RNA. Further work then determines if the RNA is translated into protein, and ultimately, if the protein has a function.

But in order for this scenario for orphan gene creation to work, functional protein sequences have to be easy to acquire, within reach of an evolutionary search starting from an existing non-functional stretch of DNA. Evolutionists tend to think that such a thing happens easily. Evolutionary processes can produce a new gene or structure or chemical activity easily. This must be true if evolutionary processes are the explanation for orphan genes.

The Rarity of Functional Protein Folds

In contrast, ID proponents think that it’s very difficult to get function from random sequence. There’s a definite reason for this. Experiments by Dr. Douglas Axe measured the rarity of functional protein folds in sequence space (only 1 in 10^77 proteins form a fold with a target function, a very, very, very small number). If functional proteins are very rare in sequence space, that makes it very difficult to get new genes or structures or chemical activities. Others have found similar answers, when asking for the requirements to produce an enzymatic activity. Others, when asking for simple kinds of activity, like sticking to a column loaded with a substrate like ATP, get numbers that are conceivably within range of evolutionary processes. Just sticking to a column is not nearly as demanding as carrying out an enzymatic reaction.

There are strong points of view as to the reliability of the various methods. How the various experiments are judged tends to be influenced by one’s particular view on the question of evolution. So the best thing is to do more experiments, which is precisely what the scientific community is doing.

Work is in progress now in many labs to test the question of how hard it is to get an orphan gene from non-coding sequence. Some are asking how hard it is to get a promoter (necessary to promote active transcription). Some are asking how likely it is for random sequence to have function. The sticking point, literally, seems to be that random sequences don’t fold properly and are insoluble in water. They aggregate. That makes most kinds of function difficult, to say the least. Lastly, how likely is it that the function will actually be helpful? We’ll see.

The answer is not in. If Doug Axe is right (and remember, he is not the only researcher to have found that functional proteins are very rare in sequence space), then getting an orphan gene by an evolutionary process is extremely unlikely. But orphan genes are possible, maybe even to be expected, when a designing intelligence acts.

Photo: Leafcutter ants in the Wilhelma Zoo, Germany, by Pjt56 [CC BY-SA 4.0 ], from Wikimedia Commons.