Some scientists have a new pet project: sequence everything! They’ve given this idea a name: the Earth BioGenome Project (EGP). Specifically, the goal is to sequence every eukaryotic species that has been taxonomically designated. Mark Blaxter et al. explain in their perspective article in PNAS, “Why sequence all eukaryotes?” that a primary goal of the proposal is to understand evolution.
Life on Earth has evolved from initial simplicity to the astounding complexity we experience today. Bacteria and archaea have largely excelled in metabolic diversification, but eukaryotes additionally display abundant morphological innovation. How have these innovations come about and what constraints are there on the origins of novelty and the continuing maintenance of biodiversity on Earth? The history of life and the code for the working parts of cells and systems are written in the genome. The Earth BioGenome Project has proposed that the genomes of all extant, named eukaryotes — about 2 million species — should be sequenced to high quality to produce a digital library of life on Earth, beginning with strategic phylogenetic, ecological, and high-impact priorities.
The EBG would certainly provide job security for numerous lab workers, but will it really provide the wisdom needed to understand life and evolution? Blaxter and his 25 co-authors think it will. Coming from a Who’s Who of major scientific institutions and government labs from Sweden to China to America, they also claim it will have numerous other practical benefits. Advocates of a proposal like to toss in suggestions that their work might help cure cancer, aid farmers, or mitigate climate change, but clearly the priorities are to solve evolutionary questions. This is evident from the sixty mentions of the word evolution in the essay.
We suggest that many questions of evolutionary and ecological significance will only be addressable when whole-genome data representing divergences at all of the branchings in the tree of life or all species in natural ecosystems are available. We envisage that a genomic tree of life will foster understanding of the ongoing processes of speciation, adaptation, and organismal dependencies within entire ecosystems. These explorations will resolve long-standing problems in phylogenetics, evolution, ecology, conservation, agriculture, bioindustry, and medicine.
Is That Necessarily So?
Consider a hypothetical proposal to find every fossil on earth. Would it solve evolutionary questions? In Darwin’s Dilemma, Paul Chien argued that enough fossils have been discovered to see the global patterns. One more scallop on the beach is not likely to change the picture, let alone millions of them. Some big data projects, however, can be very instructive, such as the ENCODE project and its spin-offs that found more function in noncoding DNA than expected.
Money for such massive projects can be an issue; remember the Superconducting Supercollider? The paper mentions private funding: “This research was funded in whole, or in part, by Wellcome Trust Grants 206194 and 218328.” The Wellcome Trust is a charitable foundation in the UK with a mission “to fund research to improve human and animal health.” Whether governments will toss in some dollars is not stated, but certainly private foundations can spend their money as they choose. No problem there. The work is certainly conceivable, and with big data projects, storage of the information is not a problem. The question is whether this is the highest and best use of time and equipment by biologists and geneticists. Let the proponents make their case. Their justifications can be summarized:
- Discovering the Trees of Life
- Defining the Origin of Eukaryotic Cells
- Tracking Genomic Changes in Symbiosis
- Decrypting Chromosome Evolution
- Revealing the Deep Logic of Eukaryotic Gene Regulation
- Probing the Diversity of Sexual Systems
- Exploring Diversity in the Genomics of Speciation
- Decoding the Genomics of Complex Traits
- Understanding Ecosystem Function, Stasis, and Change
- Building Genomics-Informed Conservation
- Inventing New Tools and Resources
- Preserve for Posterity the Diversity and History of the Planet’s Biology
Four of these (1, 2, 4, 7) are primarily evolutionary questions; several others (3, 6, 8, 9) overlap with evolution. Evolutionary questions are not necessarily useless pursuits; they might have design implications if the results do not support neo-Darwinism.
Hidden Assumptions in the Proposal
There are some hidden assumptions in the proposal. One is that all genomes of a particular species are alike. That cannot be true because numerous subspecies inhabit differing environments. Will this require multiple samples from some species? Talk of “The Human Genome” glosses over the diversity of humans, requiring further investigation of haplotypes. Will that be an issue for Canis familiaris, the domestic dog that varies from mastiff to chihuahua? Additionally, will a genome from a male and a female be required to cover the sex chromosomes of each species? Another dubious assumption is that scientists know what a species is. This touches on a vexed philosophical question in taxonomy: whether the current taxonomical system carves nature at its joints.
Another issue to ponder is whether a massive sequencing project at this scale is the only way to find out the answers to all 12 of the questions. If it is not, the EBG could be a huge boondoggle, a waste of time and money that could be better spent elsewhere. Could a well-chosen selection of genomes serve the purpose just as well?
More than Big Data Is Needed
The authors welcome opinions about the project:
The big questions we have posed derive from our collective discussions, but we are aware — and indeed hope — that there will be additional major questions that others believe can be answered by sequencing and functionally annotating all eukaryotic genomes. We invite you to add questions to the roster, to widen the debate, and to, ultimately, fully realize the promise of biological understanding based on the complete genome sequence of all of Earth’s remarkable species.
Understanding requires more than big data. If the EBG project takes root, research teams will find themselves neck-deep in arbitrary decisions requiring wisdom to get meaningful results. In their concluding sales pitch, the others make it sound like evolutionary understanding will simply leap out of the data. That rarely happens. Data need interpretation by human beings exercising wisdom and discernment.
Notice the volume of verbiage about evolution, with a few crumbs of societal benefit added to the end like rosy frosting on the Darwin cake:
The genomes will be the core data from which the phylogeny of all life is inferred, including the complex reticulations that endosymbiosis, horizontal transfer, hybridization, and introgression have created. Complete genome assemblies enable a broader and more complete understanding of a species’ biology, contributing to a lessened risk of extinction. Within the unifying model of this phylogenetic network, the genomes and the genes they possess will enable understanding of regulatory networks and trait evolution, the dynamics of coevolutionbetween genes and between species, the impact of changing environments on species and populations, the mechanistic link between genotypes and phenotypes, and the drivers of genome–environment interactions. These analyses, in turn, will enable biologists to better characterize fundamental evolutionary processes, from the nucleotide to the genome level, identifying processes active under different chromosomal architectures and gene interaction networks. These dramatic advances in understanding of both the wide sweep and the local details of genomic and organismal evolution will enable the inference of ancestral genomes and their traits, which will be transformative for understanding how life evolved on Earth, predicting future evolution, and inspiring bioengineering of organisms with beneficial traits using technologies such as CRISPR and whole-genome synthesis. This foundational library of information will change the economic and social growth of the future, fostering sustainable agriculture and new bioeconomies, accessing an expanded medical pharmacopoeia, and promoting societal equity and diversity through the lens of a deeply valued biodiversity.
Let’s leave the value of this proposal as an open question for design advocates, who will likely have differing opinions about it. Would that some of the enthusiasm for such a massive undertaking, though, would be reserved for exploring the biological engineering so evident in life.