Nature has published the latest update to the FANTOM project (Functional Annotation of the Mammalian Genome), called FANTOM5. This study is different from most genomic studies, in that it examined gene promoter sites as a measure of function. News from the International School for Advanced Studies (SISSA) in Trieste explains:
Genes are the "code" for building the biological elements that form an organism. The DNA that makes up genes contains the instructions to synthesise proteins, but it’s wrong to think that, for a given gene, these instructions are always the same for all parts of the organisms. In actual fact, the gene varies depending on the tissue where it is located (cerebral cortex, cerebellum, olfactory epithelium, etc.); in particular, what varies is the point in the "string" of code at which protein synthesis starts. (Emphasis added.)
What this implies is that scientists cannot look at a portion of "the human genome" from one tissue sample and conclude they have figured it out its function. A judgment of "junk DNA" could be wrong. Indeed, FANTOM5’s survey of 95 percent of protein-coding genes found near universal coverage by promoters, based on the first few bases of RNA transcripts. FANTOM5 extends the third and fourth FANTOM atlases by including 4,721 human and 5,127 mouse genes, focusing on primary cells, cell lines and tissues, but that’s not all: "The atlas also detected signals from the promoters of short RNA primary transcripts, and long non-coding RNAs." More:
The FANTOM5 promoter atlas is a natural extension of earlier maps of active transcripts and promoters complementing the sequencing of mammalian genomes. It represents an advance in an order of magnitude in the wide range of cell types and the amount of data produced per sample, and using single-molecule sequencing avoided polymerase chain reaction (PCR), digestion and cloning bias. We have identified and quantified the activity of at least one promoter for more than 95% of annotated protein-coding genes in the human reference genome; only the activity of 1,225 promoters remains uncharacterized. Some of these may not actually be expressed. Some cannot be unambiguously measured with CAGE due to copy number variants or closely related multigene families. The remaining promoters are probably expressed in rare cell types or during windows of development or states of cellular activation that are not readily accessible and remain to be sampled. A continued effort to add profiles from these cells will make it possible to integrate them with the FANTOM5 data, and to extract metadata to identify those regulatory elements that are new and lineage-specific.
The researchers examined primary cells (not cancer cells) as the "logical choice," they said. These are the healthy cells that function normally. "In all these respects, the FANTOM5 data set greatly extends the data generated by ENCODE5 to further our knowledge of genome function."
To round out this conclusion of widespread function in coding genes, a study conducted at UC Berkeley found more evidence of function in non-coding regions of plants. Announced on EurekAlert with the title "New Functions for ‘Junk’ DNA?" the study offers a warning to those too eager to relegate genetic sequences to junk:
DNA is the molecule that encodes the genetic instructions enabling a cell to produce the thousands of proteins it typically needs. The linear sequence of the A, T, C, and G bases in what is called coding DNA determines the particular protein that a short segment of DNA, known as a gene, will encode. But in many organisms, there is much more DNA in a cell than is needed to code for all the necessary proteins. This non-coding DNA was often referred to as "junk" DNA because it seemed unnecessary. But in retrospect, we did not yet understand the function of these seemingly unnecessary DNA sequences.
We now know that non-coding DNA can have important functions other than encoding proteins. Many non-coding sequences produce RNA molecules that regulate gene expression by turning them on and off. Others contain enhancer or inhibitory elements. Recent work by the international ENCODE (Encyclopedia of DNA Elements) Project (1, 2) suggested that a large percentage of non-coding DNA, which makes up an estimated 95% of the human genome, has a function in gene regulation. Thus, it is premature to say that "junk" DNA does not have a function — we just need to find out what it is!
That’s the spirit! Diane Burgess and Michael Freeling sampled the genomes of a wide variety of plant species (e.g., rice, banana, cacao, the model plant Arabidopsis, and other flowering plants, both monocots and dicots) and found "numerous conserved non-coding sequences" (CNSs). These give evidence of function. "DNA sequences that are highly conserved, meaning that they are identical or nearly so in a variety of organisms, are likely to have important functions in basic biological processes." Most emphatically, this is not junk. What is it?
So what could be the function of these deep CNSs? We can get clues by analyzing the types of genes with which these CNSs are associated. The researchers found that nearly all of the deep CNSs are associated with genes involved in basic and universal biological processes in flowering plants — processes such as development, response to hormones, and regulation of gene expression. They found that the majority of these CNSs are associated with genes involved in tissue and organ development, post-embryonic differentiation, flowering, and production of reproductive structures. Others are associated with hormone- and salt-responsive genes or with genes encoding transcription factors, which are regulatory proteins that control gene expression by turning other genes on and off.
In addition, they showed that these CNSs are enriched for binding sites for transcription factors, and propose that the function of some of this non-coding DNA is to act as a scaffold for organization of the gene expression machinery. The binding sites they found are known sequences implicated in other plants as necessary for response to biotic and abiotic stress, light, and hormones. Furthermore, they discovered that a number of the CNSs could produce RNAs that have extensive double-stranded regions. These double-stranded regions have been shown to be involved in RNA stability, degradation, and in regulation of gene expression. Twelve of the most 59 highly conserved CNSs are associated with genes whose protein products interact with RNA. Clearly, these DNA sequences are not merely "junk!"
Like us, the researchers do not doubt evolution in the sense of change over millions of years. That notion, however, was of little use in this research. Instead, by assuming that if it’s conserved, it must be functional, they extended our understanding of genetics. That assumption is similar to Paul Nelson’s principle, "If something works, it’s not happening by accident."
But why is not every stretch of a genome conserved? Variations and mutations do not falsify the design hypothesis. They reinforce it. They show that despite perturbations, living things are remarkably robust. Many mutations are repaired by proofreading machines in cells — a strong piece of evidence for design. Mutations that do not kill an organism can be tolerated and, over time, accumulate as neutral variations or potential disease states. They are a bit like typos in a sequence of copied manuscripts. Analysts can detect accumulated errors in families of manuscript copies, but would never think that the manuscripts were not intelligently designed.
These studies are another reminder that the "junk DNA" myth was a rush to judgment by Darwinian evolutionists who, because of their own biases, assumed that unguided biological processes are wasteful and inefficient. The opposite expectation — that designed systems are highly functional and efficient — is being vindicated again and again by research.
Image source: Flickr.