How can random non-coding DNA be, at the same time, both functional (as in the genome) and non-functional (as in extremely unlikely to code for functional proteins)?
This question was posed recently at Peaceful Science, a discussion site that seeks to promote dialog between atheists, theistic evolutionists, and proponents of intelligent design. (Their success is mixed. ID proponents often feel like Aragorn in the last battle.) It’s a good question. It came out of a conversation about orphan genes, where I was arguing that non-coding DNA was extremely unlikely to give rise to a new coding sequence with any function. Yet ID people claim all the time that the majority of the genome is functional. How can sequence be both functional and non-functional at the same time? The answer turns on two things. The meaning of “function” and clarity about what’s being tested.
I had been trying to explain Doug Axe’s results to the group of debaters, most of whom did not agree. According to Axe, sequences that can produce a functional protein, namely a protein capable of carrying out an enzymatic reaction, are extremely rare. (They could be rare in number and/or rare in how far apart they are spread in sequence space.)
Picture a Bank Vault
Think of a situation where you have to crack the code on a bank vault, with many dials in the code, say 150, each specifying 1 out of 10 digits. If there is only one code that will work, the number of possible sequences to try is 10^150, Now say that 100 sequences out of 10^150 would work. That reduces the number you would have to try. It would now be 10^148.
What’s the solution? Well, suppose there was another bank next door, that had a similar code, in fact with 125 of the dials identical! And you happened to know that code. Now the information required is greatly reduced. You have only 10^25 to get. Likely success? And if you are very lucky and know the code nearly completely, all but 3 dials (maybe you know the teller or the person who built the vault), it is definitely easier to break the code. 10^3 =10 x 10 x 10. You have a pretty good chance of success.
The problem is worse for proteins. They have twenty possible amino acids for each position in a protein, so the total possible sequences for a protein 150 amino acids long is 20^150.
To do a random search through that whole space of 20^150 is not possible, just like it would be impossible to search through the 150 dials to find the bank code. But if proteins are not far apart in sequence space, like the bank code where almost all of the code was identical to another bank’s code, then the chances of finding a sequence that will work are greatly improved.
A Crime Spree
Now consider one more thing. Suppose suddenly there were bank robberies everywhere, and it wasn’t by force. The dials had been turned to the correct combinations. What would be your inference? I would say that someone knew the codes.
So unless functional sequences are easy to find (very common), and/or are clustered together (easily reachable from one functional island to another), explaining current protein diversity without design is impossible.
I’ll break that down.
- “Unless functional sequences are easy to find (very common), and/or are clustered together (easily reachable from one functional island to another)”: I am laying out the conditions where it might be possible to find function.
- “Explaining current protein diversity without design is impossible”: Unless the above conditions are met, namely that functional sequences are easy to find or clustered together, we won’t be able to find functional sequences, unless design has been involved.
Now turn the sentence around.
Explaining current protein diversity without design is impossible, unless functional sequences are easy to find (very common), and/or are clustered together (easily reachable from one functional island to another).
As a consequence, if we find that apparently random non-coding sequences have given rise to new genes and proteins in many genomes, in fact representing 10-30 percent of the genomes analyzed, that result should surprise us, given what I said above. But we need additional evidence still. See below.
Now for the other half of the problem or confusion here. In ENCODE, scientists claimed that the majority of our DNA was functional, meaning it had some sign of biochemical activity. Transcription, methylation, a site for DNA binding, etc., any of these would qualify as functional in some sense. But even ENCODE workers admit they don’t know how much of that “function” will be meaningful.
In the ENCODE sense, most genomic sequence is functional, thus functional sequence is common (20-80 percent was the original range offered). Just remember what function means here — biochemical function, not sequence coding for functional proteins.
So Which Is It?
If the genome is functional in the sense of ENCODE, that agrees with one part of ID. Some of us argued that the genome would not be junk. We would expect some kind of function for most of it.
But being functional in the biochemical sense (à la ENCODE) does not mean it is easy to give rise to new genes and proteins. When we say functional sequence is rare in sequence space, we mean a different sort of function and sequence than in ENCODE. We mean a sequence that can have the ability to carry out an enzymatic reaction. It is our claim that proteins made from random sequence will rarely if ever have any sort of enzymatic activity.
That is why experimental tests are to be desired. Can random DNA sequence produce functional proteins with enzymatic activity or not? If experiments say no, that implies something extra is going on, because we do see lots of de novo genes.
However, such experiments may be impossible, because of the inability to test enough sequence to get a handle on a likely small signal. Proving a negative is always difficult. None of the protocols I know can screen enough sequences to test Doug’s hypothesis.
However, if it is easy to get a functional enzyme from random DNA, if there should be a positive result, that would definitely argue that de novo genes may be the product of natural processes, and not necessarily design.
As I said earlier, there are labs examining this question of the difficulty of getting enzymatic function from random sequence. I look forward to their results.
There is more that could be said here but I’ll save it for another time.