Evolution Icon Evolution
Intelligent Design Icon Intelligent Design

Is the INK4a/ARF Overlap a Settled Example of Poor Design?

Emily Reeves
Photo: Cancer cells, by National Cancer Institute.

In three posts so far (herehere, and here), I have been responding to a TedX talk by recent MIT bioengineering PhD Dr. Erika DeBenedictis titled “It’s Time for Intelligent Design.” In my last post, we examined three examples of optimality in core biological infrastructure: development, metabolism, and the amino acid chemical space. Observing optimality in core biological infrastructure should encourage further testing of the optimality hypothesis in peripheral areas of biology. These examples should also caution us about predicting mechanisms of genome architecture we don’t fully understand as “poor design.” Further, if aspects of biology really are optimal then when random mutation disrupts that optimality, human directed intelligent design may be able to step in and revert to the design, restoring optimality. In contrast, if aspects of biology are optimal but, due to a misunderstanding of the system or constraints we interpret it as poor design and seek to reengineer it, then suboptimality may be the result. 

“Painfully Embarrassing”?

Just because optimality has been observed in some core biological infrastructure, that doesn’t mean all aspects of biology are well designed. At [6:11] Dr. DeBenedictis shares her frustration as an engineer about the overlapping sequence between INK4a and ARF. She calls this a “dumb mistake” and argues:

To me this area of the human genome is like painfully embarrassing. Over billions of years of evolution, genomes will accumulate these errors. It’s just heartbreaking because any human engineer would catch this, no one would engineer this deliberately.

Dr. DeBenedictis’s basic argument is that there is a section of the human genome where two genes overlap, both of which are important for suppressing tumors. She argues that because of this overlap, a single mutation in the overlap region could disrupt both genes, resulting in two broken or mutated genes as opposed to one. In her view it would be better for the genes to have no overlap, spacing out potentially harmful mutations. However, just because the INK4a/ARF overlap is nonintuitive, that doesn’t mean it isn’t the best possible design. There may be good reasons for the overlap.

Indeed, this idea that the INK4a/ARF overlap is a “dumb mistake” is problematic for those committed to evolutionary explanations. According to standard models of gene evolution, genes duplicate and evolve new functions all the time. In this case, if the overlap region really is deleterious, then a duplication of this region, followed by subfunctionalization of one gene in one copy and the other gene in the duplicate, should be able to remove the overlap. At least that’s what standard stories of gene evolution tell us. Even in an evolutionary framework, the fact that the overlap persists would seem to suggest that it’s there for some functional reason and isn’t a harmful or poorly designed property of the genome. 

From the intelligent design perspective, it is easier to explain. This region may be a necessary biological compromise given constraints on the system. Alternatively it could be a deleterious compression of biological information that has occurred post-design, a corrupting of biological information. Or its genius may yet need to be discovered. For added clarity, let’s look at known functions for overlapping reading frames.

Known Reasons for Overlapping Reading Frames

Overlapping genes have received more attention in recent years as intriguing aspects of biological information encoding. (Pavesi) To date, known functions for overlapping sequences include the data-compression hypothesis which states that dual encoding is sometimes required when there is a strong biophysical limit on genome size. (Chirico et al.) Living in the digital age, we exploit the usefulness of data compression each time we compress a raw image or audio file to a JPEG or MP3. We recognize data compression as a necessary trade-off that reduces space and lowers the resources needed to transmit data. 

Another function of overlapping genes is to serve as gene nurseries. (Pavesi) Overlapping genes present a puzzle for natural selection as two proteins each subject to selection are encoded in the same stretch of DNA. Selective forces on one protein constrain the evolution of the other. This constraint may actually reduce the sequence search space, allowing for development of a novel gene based on existing genetic information. Interestingly, the evolution of overlapping open reading frames is often asymmetric. That means one protein’s open reading frame acquires non-synonymous changes (affecting the amino acid sequence) while the other acquires mostly synonymous mutations (leaving the amino acid sequence unaffected). This is the case for ARF and INK4a. This area is subject to a high intrinsic rate of mutation but ARF accumulates nearly all the non-synonymous mutations while INK4a is spared.

Another reason for overlapping reading frames in a single transcript is coupling of transcription and translation.

Tempting Explanations for the Overlap

To me, these reasons do not seem satisfactory for the overlap of INK4a and ARF. The high intrinsic mutation rate of INK4a and ARF appears to be in conflict with the function of these proteins as tumor suppressors. As Szklarczyk et al. write: “From the selection standpoint, it would make most sense to maintain a virtually invariable tumor suppressor.” However, Szklarczyk offers some more tempting explanations for this overlap. 

Yet one possible benefit of tight coupling between INK4a and ARF may be in the sharing of the 3′-UTR region. The two mRNAs exhibit extraordinary stability, which is thought to be determined primarily by 3′-end sequence elements (43). Another possibility is that the region of overlap may be sustained through evolution to promote variability to eliminate mutants (44). Whereas fragments of the gene (e.g., single coding exons) require high variability, still other fragments may be selected for lower amino acid changing substitutions. In this case, a single mutation in the region of overlap would lead to multiple amino acid substitutions, subsequently invoking an organism’s surveillance mechanisms.

(Szklarczyk et al.)

While one or none of the above reasons could account for the INK4a and ARF overlap, currently there is not enough research to say for sure. But there is certainly the suggestion of a function for the overlap. This much is clear: in light of the limited current understanding of overlapping genes, it is premature to label the INK4a/ARF overlap as a “dumb mistake.” Instead investigation should continue as to whether this is an example of deleterious information compression, the best solution to multiple conflicting constraints, or maybe a genius way of alerting the surveillance system that there is a problem with that cell’s tumor suppressor genes. In my next post I will highlight why I recommend this cautious approach from a historical standpoint.