Nature Reviews Genetics — Pseudogene Function Is “Prematurely Dismissed”
A new paper in Nature Reviews Genetics, “Overcoming challenges and dogmas to understand the functions of pseudogenes,” is simply incredible. It documents not only that pseudogenes have been found to have widespread function but also that under current “dogma” in biology, and given the technical limitations, we are failing to recognize their functions. As Seth W. Cheetham and his co-authors put it, biology suffers from “demotivation into exploring pseudogene function by the a priori assumption that they are functionless” where “The dominant limitation in advancing the investigation of pseudogenes now lies in the trappings of the prevailing mindset that pseudogenic regions are intrinsically non-functional.”
The abstract lays out exactly what they think:
Pseudogenes are defined as regions of the genome that contain defective copies of genes. They exist across almost all forms of life, and in mammalian genomes are annotated in similar numbers to recognized protein-coding genes. Although often presumed to lack function, growing numbers of pseudogenes are being found to play important biological roles. In consideration of their evolutionary origins and inherent limitations in genome annotation practices, we posit that pseudogenes have been classified on a scientifically unsubstantiated basis. We reflect that a broad misunderstanding of pseudogenes, perpetuated in part by the pejorative inference of the ‘pseudogene’ label, has led to their frequent dismissal from functional assessment and exclusion from genomic analyses. With the advent of technologies that simplify the study of pseudogenes, we propose that an objective reassessment of these genomic elements will reveal valuable insights into genome function and evolution.
They immediately caution that there are many instances where DNA that was dismissed as pseudogene junk was later found to be functional: “with a growing number of instances of pseudogene-annotated regions later found to exhibit biological function, there is an emerging risk that these regions of the genome are prematurely dismissed as pseudogenic and therefore regarded as void of function.”
Seek Function and You Shall Find It
In 2003, Francisco Ayala and Evgeniy Balakirev wrote in Annual Review of Genetics that “pseudogenes that have been suitably investigated often exhibit functional roles.” This new Nature Reviews Genetics paper offers a very similar statement: “Where pseudogenes have been studied directly they are often found to have quantifiable biological roles.” It’s a long narrative that recounts how many scientists mistakenly dismissed stretches of DNA as “pseudogenes.” They document dozens of instances where “pseudogenes” in humans and other organisms have been found to have function.
Some of these functions are “protein-based,” meaning the pseudogene actually generates a functional protein. But other functions can be “RNA-based” or “DNA-based.” For example, most evolutionists would presume that a pseudogene that does not produce a protein can’t be functional. But the paper observes that “pseudogenes” that cannot be translated into a protein may still have a function through their RNA transcript:
Many pseudogenes contain a frequency of mutations that render them unlikely to be (or incapable of being) translated into proteins. However, such mutations do not necessarily preclude pseudogenes from performing a biological function.
The paper notes that even if the RNA transcript of a pseudogene can’t be translated into protein, “a myriad of RNA-based regulatory mechanisms have been described for pseudogenes, including processing into small interfering RNAs (siRNAs) that may regulate their parent genes, acting as a decoy for transcription factors and, most prominently, as molecular sponges for microRNAs.”
Many evolutionists would forcefully assume that if a pseudogene can’t even produce an RNA transcript then it can’t be functional. But it turns out that pseudogenes that don’t produce any RNA transcript (i.e., aren’t transcribed) can still have important functions:
Another mechanism through which pseudogenes can function is by influencing chromatin or genomic architecture. HBBP1, a pseudogene residing within the haemoglobin locus, enables the dynamic chromatin changes that regulate expression of fetal and adult globin genes during development. Notably, although inhibiting HBBP1 transcription has no effect, deletion of the genomic locus reactivates fetal globin expression. HBBP1 DNA contacts, but not transcription, are required for suppressing the expression of fetal globin genes in adult erythroid cells.
A variety of other non-transcriptional functions are documented in the paper, including stabilizing chromosomes, mediating transcript-splicing, and regulating recombination. Thus, in many cases copy numbers of pseudogenes seem to have functional importance, where deviations from the normal genetic state causes disease. They predict: “It is expected that further links between human pseudogene polymorphisms and complex diseases will be identified in the coming years”
The implication is that one reason we presume pseudogenes are functionless is because we haven’t been looking for their functions. And why didn’t we look for their functions? Because we presumed they were functionless! So there’s a circular aspect to the reasoning here. It has created the science-stopping junk-DNA paradigm, which has prevented us from understanding what pseudogenes really do.
The typical response from evolutionists would be that all of these examples of functional pseudogenes are just isolated rare cases, and that the bulk of pseudogenes are clearly junk. The authors of the paper — who give no indication of sympathy for intelligent design, but definitely oppose dismissing pseudogenes as “junk” — are aware of this objection. They say the following in direct rebuttal to it:
The examples of pseudogene function elaborated on here should not imply that pseudogene functionality is likely to be confined to isolated instances. At least 15% of pseudogenes are transcriptionally active across three phyla, many of which are proximal to conserved regulatory regions. It is estimated that at least 63 new human-specific protein-coding genes were formed by retrotransposition since the divergence from other primates. Numerous ‘retrogenes’ continue to be recognized as functional protein-coding genes rather than pseudogenes across species. High-throughput mass spectrometry and ribosomal profiling approaches have identified hundreds of pseudogenes that are translated into peptides. Although the functions of these peptides remain to be experimentally determined, such examples illustrate the challenge in substantiating a gene–pseudogene dichotomy.
They continue: “As the abundance of such [non-coding-DNA] acquired functions does not appear to be an especially rare or isolated phenomenon, it would seem remiss to take the default perspective that processed pseudogenes are functionless. Instead, it is probable that pseudogene-containing regions of the genome harbour important biological functions that are yet to be revealed.”
They point out that current algorithmic and computational methods employed for differentiating pseudogenes and protein-coding genes may overestimate the proportion of the genome that is composed of pseudogenes. Why? Because the properties that are used to define many “pseudogenes” are also often found in normal protein-coding genes. For example:
- Processed pseudogenes are often identified by their lack of introns, but some protein-coding genes also lack introns.
- Some pseudogenes lack “stop” mutations that truncate the protein, and thus “have the same protein-coding capacity as their parent genes.”
- Low proportions of non-synonymous to synonymous mutations do not distinguish recent pseudogenes from normal genes.
- Lack of transcription is not a good test because “determining the transcriptional state of pseudogenes is technically challenging.”
Because of this, they argue that “computational differentiation of pseudogenes from genes on a purely rule-based system is unlikely to be feasible as it will inherently conflict with many protein-coding genes.” They therefore propose markedly softening claims that a stretch of DNA is a pseudogene: “it may be useful to consider the annotation of pseudogenes in genomes as a prediction or a hypothesis rather than a classification.”
As the authors show, the presumption that a pseudogene is functionless needs to be abandoned. But then, why are we still presuming they are functionless? There are three main reasons: (1) evolutionary thinking has presumed that pseudogenes are functionless junk, (2) terminological “dogma” reinforces a “mindset that pseudogenic regions are intrinsically non-functional,” and (3) technological limitations prevent us from discovering their function. The paper acknowledges that problem (3) stems from problem (2), but it fails to explicitly recognize that both problems (2) and (3) ultimately stem from problem (1). In fact it doesn’t even identify problem (1) as a problem. Yet the whole situation traces back to bad evolutionary predictions. Let’s look at these causes briefly, in reverse order:
(3) Technological Limitations
The proximal cause that prevents us from understanding pseudogene functions are technological limitations. Because of the junk DNA paradigm, a lot of our biochemical techniques and technologies are set up only to identify standard protein-coding genes. They ignore and dismiss DNA that doesn’t fit that mold. Only by updating our technology to detect functional DNA elements that don’t necessarily fit the standard definition of a “gene” can be we begin to understand what pseudogenes really do. The paper explains that technical limitations, informed by our biases and assumptions, demotivate the study of pseudogene functions:
In addition to the demotivation into exploring pseudogene function by the a priori assumption that they are functionless, their systematic study has also been hindered by a lack of robust methodologies capable of distinguishing the biological activities of pseudogenes from the functions of the genes from which they are derived.
They compare the situation to that of long non-coding RNAs (lncRNAs), which “were similarly dismissed initially as emanating from ‘junk DNA’ or as transcriptional noise, largely by virtue of their definition as non-protein-coding.” But as technology developed, lncRNAs are now widely recognized as functional and we regularly screen for their functions:
Following a combination of technology developments, genome-wide studies and detailed biochemical studies, lncRNAs are now routinely included in genome-wide analyses, and their functional potential as cellular regulators is widely recognized.
However, at present, the authors note, “due in part to the experimental challenge of investigating their function and expression, pseudogenes are typically excluded from genome-wide functional screens and expression analyses.” In other words, one of the main reasons we aren’t finding function for pseudogenes is because we aren’t looking for it. This needs to change, and they argue that it can.
For example, according to the paper, processed pseudogenes “were presumed to have been rendered transcriptionally silent by the loss of cis-regulatory elements.” But we now know that “thousands of retrotransposed gene copies are transcribed and are often spliced into known protein-coding transcripts” and “up to 10,000 mouse pseudogenes have evidence of transcription.” By trying to study these transcripts we can understand what they may be doing.
One complication is that pseudogene transcription shows “cell-type specificity and dynamic expression” — meaning they may only be transcribed in particular places at particular times. This is all the more reason not to assume that lack of evidence for the function of a pseudogene is evidence that the pseudogene has no function! It very likely may be functional in a cell-type or a situation that we just haven’t properly investigated yet. As they put it, “The use of assays ill-suited to analysis of pseudogenes has arguably stymied elucidation of their biological roles.” But they are hopeful: “CRISPR-based approaches, carefully applied, have the potential to revolutionize our ability to dissect the functions of pseudogenes.” They conclude that it’s time to stop excluding pseudogenes from biochemical analyses and start using techniques that can identify their functions:
The use of a liberal definition of pseudogenes is attractive as it simplifies genomic analyses. This approach, often unknowingly to the researcher, leads to the consolidation of the pseudogene classification — that is, their exclusion by convenience in functional studies. Many regions now considered to be ‘dead genes’ potentially encode cis-regulatory elements, non-coding RNAs and proteins with impacts in human biology and health. Accordingly, determining the functions of putative pseudogenes warrants active pursuit by their inclusion in functional screens and analyses of genomic, transcriptomic and proteomic data. With innovations in long-read sequencing and CRISPR-based methodologies now readily accessible, the technological limitations that formerly motivated the exclusion from functional investigation are largely resolved.
Until we develop and apply these technologies to put pseudogenes to the proper test, the assumption that they are functionless junk is completely unwarranted. And it’s not hard to predict what the outcome will be. As Ayala and Balakirev noted, “pseudogenes that have been suitably investigated often exhibit functional roles.” Or as this new paper observes, “Where pseudogenes have been studied directly they are often found to have quantifiable biological roles.”
(2) Bad Terminology and False Paradigms
Technology only reflects what people want to do, and there are reasons why biologists have created hardly any technology to investigate pseudogenes: biologists presume (wrongly) that pseudogenes are nonfunctional junk. The paper argues that the terminology associated with the “junk” DNA paradigm discourages investigation into their function. Thus, we have terms like “pseudogene” which by their very nature imply that the DNA isn’t a gene but something like a wannabe gene that doesn’t do anything. As the authors note, the definition of a “pseudogene” as “defective” means “the non-functionality of pseudogenes remains the dominant and default perception.” Citing Thomas Kuhn and his concept of a “dominant paradigm” that is intolerant of criticisms, they lash the junk-“pseudogene” paradigm in strong terms:
[T]he term pseudogene itself asserts a paradigm of non-functionality through its taxonomic construction. Pseudogenes are defined as defective and not genes. This point is highlighted because impartial language in science is known to inherently restrict the neutral investigation between conflicting paradigms. In the case of pseudogenes, the term itself is constructed to support the dominant paradigm and therefore limit, consciously or unconsciously, scientific objectivity in their investigation.
It’s hard to imagine a greater indictment of the idea that pseudogenes are generally functionless. They continue to explain how use of the term “pseudogene” hinders scientific research:
Although the pseudogene concept arose to describe an individual molecular phenomenon, the term was rapidly adopted to annotate tens of thousands of genomic regions that met only loosely defined criteria and was effectively axiomatized without being subject to any rigorous scientific debate. This lack of consensus-seeking process has left genome biology with a legacy concept that obscures objective investigation of genome function.
They recommend using different language where “[t]he automated classification of gene-like sequences as pseudogenes should be avoided. Instead, we propose that descriptive terms that do not make functional inferences should be used in reference to genomic elements that arose from gene duplication and retrotransposition” and “terminology should not impose any unsubstantiated assumption on end users.”
So what is now stopping us from elucidating the functions of pseudogenes? The only obstacle is a mental block — not a technical or evidential one:
The dominant limitation in advancing the investigation of pseudogenes now lies in the trappings of the prevailing mindset that pseudogenic regions are intrinsically non-functional.
The paper predicts that as soon as we lose this “mindset,” there will remain no technical limitations blocking us from progress in understanding the functions of pseudogenes: “With renewed scientific objectivity, we anticipate that a wealth of discoveries to understand genome function, its role in disease and the development of new treatments is within reach.”
That’s good news, but we must ask a question the paper fails to ask: Why did this terminology develop in the first place?
(1) Evolutionary Thinking — “A Relic of Evolution”?
Evolutionary thinking is the cause that ultimately created, nurtured, and sustained the junk DNA paradigm. Yet the paper adopts a wholly evolutionary approach, and for this reason never identifies evolutionary thinking as the root problem. The closest the authors get is when they recount how the very first paper to identify a pseudogene (published in 1977) dismissed its potential function as a “relic of evolution”:
In the absence of evidence that the 5S pseudogenes were transcribed, Jacq et al. concluded that the most probable explanation for the existence of the pseudogenes is that they are a relic of evolution and are functionless1. Since the coining of the term pseudogene, its definition has broadened and is now widely accepted to define any genomic sequence that is similar to another gene and is defective.
This 1977 paper by Jacq et al. was published in the journal Cell and found a pseudogene in an African frog. That paper concluded:
We are thus forced to the conclusion that the most probable explanation for the existence of the pseudogene is that it is a relic of evolution. During the evolution of the 55 DNA of Xenopus laevis, a gene duplication occurred producing the pseudogene. Presumably the pseudogene initially functioned as a 55 gene, but then, by mutation, diverged sufficiently from the gene in its sequence so that it was no longer transcribed into an RNA product.
And there you have it: The pseudogene is seen as a mere a “relic” produced “by mutation” until it diverged so much that “it was no longer transcribed into an RNA product.” This is the classic view of a pseudogene.
Ironically, the 1977 paper went on to speculate that perhaps there is evidence for function for the pseudogene, but the authors privilege the “relic” view as the right answer until a function can be proven:
This evolutionary explanation for the presence of the pseudogene, however, is incomplete by itself in that it ignores the conservation in sequence of the pseudogene, and indeed of the entire G + C-rich spacer of 55 DNA. In an attempt to explain this, it has been suggested that the pseudogene may be a “transcribed spacer” corresponding to a primary transcript of 55 RNA, which is a transient precursor and has not so far been detected. If this is so, then most of the G + C-rich region of 55 DNA would be the structural gene for 5S RNA. This function, if true, would provide the necessary selective pressure to conserve the sequence of the “linker” and pseudogene region so that the correct processing of the postulated 300-long precursor was maintained. In the absence of any experimental evidence for such a long precursor, however, this suggestion must be regarded as speculative; it is more probable that the pseudogene is a relic of evolution.
The recent Nature Reviews Genetics paper hopes to remedy this problem by reviewing much of the overwhelming evidence for pseudogene function and emphasizing how the “the non-functionality of pseudogenes remains the dominant and default perception.” This will “limit, consciously or unconsciously, scientific objectivity in their investigation.” The authors are to be commended. However, experience teaches that unless you address the root cause of a problem, it rarely goes away. The tendency to view pseudogenes as a “relic of evolution” probably won’t change as long as you presume that the entire genome is the product of blind evolution. The paper fully endorses the latter view, providing all kinds of narrative gloss that describes pseudogenes (whether functional or not) as “retrocopies” that “arose from gene duplication and transposition.” They emphasize:
In the fundamental reductionist approach often assumed in genetics and molecular biology, the perspective is often lost that life as we observe it today is not only the product of billions of years of evolutionary processes but also still subject to these same processes.
They are welcome to take the “reductionist approach often assumed in genetics and molecular biology.” But until those fundamental evolutionary views of the genome are on the table for questioning, they won’t make much progress in shaking the science-stopping assumptions of the junk-DNA paradigm.
Photo: Xenopus laevis, by Brian Gratwicke [CC BY 2.0], via Wikimedia Commons.