Michael Behe’s Darwin Devolves will be released in February, and it will empower readers to distinguish between the actual capacities of evolution, based on hard data, and the mythic portrayals of it as having unlimited creative powers. However, what should not be missed is how other advancements in science are reaffirming arguments from Behe’s earlier books. For instance, increasingly strong evidence from cellular and mathematical biology suggests that evolutionary models are inadequate to understand the origins of irreducible complexity. In particular, the argument that the bacterial flagellum can be explained by cooption has been all but discredited by advances in research on its assembly process, coupled with calculations on required timescales for the arrival of new proteins.
Cooption as Magical Incantation
One of most popular attempts at explaining the flagellum via cooption was developed by Nicholas Matzke. He summarized his scenario as follows:
(1) There is a strong possibility, previously unrecognized, of further homologies between the type III export apparatus and F1F0-ATP synthetase. (2) Much of the flagellum’s complexity evolved after crude motility was in place, via internal gene duplications and subfunctionalization. (3) Only one major system-level change of function, and four minor shifts of function, need be invoked to explain the origin of the flagellum; this involves five subsystem-level cooption events. (4) The transition between each stage is bridgeable by the evolution of a single new binding site, coupling two pre-existing subsystems, followed by coevolutionary optimization of components. Therefore, like the eye contemplated by Darwin, careful analysis shows that there are no major obstacles to gradual evolution of the flagellum.
Matzke’s narrative largely consists of identifying similarities between flagellar proteins and other prokaryotic proteins, supplemented by creative storytelling, just as with the eye. In particular, cooption is invoked not so much as a scientific explanation, including substantive details, but more as a comfortable but unsubstantiated trope. As soon as such claims are pulled into the real world, they reveal themselves to be completely implausible.
Welcome to the Real World
Appreciating the relevant evolutionary hurdles requires attention to the details of flagellar assembly. As a brief overview, a flagellum consists of three major components: the basal body which is housed in the cell membrane, the hook which extends beyond the outer cell membrane, and the filament which acts as a propeller. The basal body includes several subcomponents:
- A series of supportive rings which attach to the cell membrane.
- A motor comprised of a stator which attaches to the cell membrane and a rotor which rotates. The motor derives its power from the proton gradient that exists across the cell membrane.
- A rod which transmits power to the hook and filament.
- A transport gate which sends proteins through the central channel of the motor to assemble structures external to the cell.
Flagellar assembly includes the following steps:
- The basal body’s proteins self-assemble in methodical order to form different subcomponents, including the transport gate.
- Chaperones attach to key flagellar proteins to protect them from degradation. They also deliver them to a complex that inserts the proteins into the transport gate.
- The transport gate sends rod-cap proteins through the channel, and they form the rod assembly tool. The rod proteins are then allowed through, and the rod cap assembles them into the rod structure.
- In a similar process, the hook-cap tool forms and then assembles the hook structure.
- After hook assembly is completed, a hook-filament junction composed of two proteins self-assembles.
- A filament-cap tool forms and assembles the filament structure.
Every step in the process is meticulously timed through a complex regulatory network of flagellar genes and protein interactions. For instance, in many flagella a protein known as FliK monitors the length of the developing hook. Once the hook is sufficiently long, FliK signals the transport gate to stop sending the hook protein through and to start sending the proteins which assemble the next structures. If any major step in the entire process proceeds improperly, feedback controls signal the assembly to cease. Understanding these details allows us to properly evaluate the plausibility of evolutionary explanations of the origin of the flagellum.
Running the Numbers
Matzke’s and other scenarios rely on the assumption that most flagellar proteins originated from the duplication of a gene encoding an existing protein. That gene then underwent a series of mutations to generate a new flagellar protein. This process of duplication and modification to a new function is called cooption. The key question is how much time would have been needed for multiple cooption events to accumulate in a single bacterium all of the requisite proteins.
Some of the most relevant research related to evolutionary timescales was conducted through Harvard’s Program for Evolutionary Dynamics and IST Austria. They published a key article which lays out two crucial findings:
- The expected time required for a random search to find one member of a set of target sequences (e.g., nucleotide sequences corresponding to a functional gene) of length L increases exponentially with L.
- The expected time required to find a target from a starting sequence that is only a “few steps away from the target set” is the same as from a starting sequence that is randomly chosen.
The second conclusion can be understood from the fact that nearly all random changes to an initial trial sequence close to a target would move early trials away from the target. The search would then need to explore the entire sequence space just as with an initial random sequence. In the context of the cooption process, the time required for a copy of a preexisting protein with some sequence similarity to a flagellar protein to evolve into the latter is just as long as for the flagellar protein to evolve from a random sequence. Therefore, calculating the minimum expected waiting time for the arrival of a new protein through cooption equates with the time for an initial random sequence to find a protein target as modeled in the Harvard study.
For any evolutionary explanation of the origin of a complex adaptation, these findings prove extremely problematic. A few straightforward calculations will highlight the magnitude of the problem. Experiments on proteins in a variety of species demonstrate that 1 in 3 mutations will completely inactivate them. Conversely, if an amino acid at a specific location in a protein (e.g., the 3rd from the end) were allowed to change to any other amino acid or to remain the same, 2/3 of amino acids on average would correspond to a viable protein. This result allows for the calculation of the chance of a random sequence of length L in the region around that of an existing protein to yield a modified version of that protein which was still operational. The maximum probability can be derived from Cauchy’s mean-value theorem to yield an upper bound of 2/3 to the power of L, P(L) < (2/3)L.
An Unrealistically High Estimate
Actually, this estimate is unrealistically high since existing functional proteins are fairly robust to change. They are more resistant to mutations than proteins that are transitioning and not yet optimized for their new function. Such transitional proteins are moving through the fringes of their fitness landscapes, so they will be barely active. The “protein fitness” of a natural protein drops exponentially with the number of acquired mutations, so barely active protein sequences greatly outnumber optimized ones. This conclusion has been demonstrated empirically by numerous experiments.
In addition, finding the path forward to a new function, if more than a few changes are required, is next to impossible. Most conversions to new functions that have been demonstrated in the lab involve either a) proteins that already share overlapping function, or b) considerable investigator effort to effect the change. To coopt an existing protein to a new flagellar function requires significant changes well beyond what has been achieved in any laboratory. Such experimental evidence further demonstrates how the derived estimate for protein rarity is overly optimistic. So it only highlights the dire implications of the Harvard timescale studies, as will be demonstrated.
The timescale article derives the number of trails required for a randomly changing sequence to successfully find a “broad peaked” target where success is defined as at least half of a trial sequence matching a target center (e.g., 500 out of 1000 nucleotides in a gene matching a specific center sequence). Each amino acid in a protein corresponds to a set of three nucleotides (codon) in its gene, but the third nucleotide is typically redundant. So the length of a search sequence corresponding to the encoded amino acid sequence can be conservatively estimated as twice the number of the amino acids. This estimate could greatly underestimate rarity since experiments on Drosophila proteins determined that 95 percent of mutations to the first two nucleotides in a codon are deleterious. Note that this assumption also favors lower search times (i.e., number of trials).
Using this estimate, the chances in the article’s computational model of a random trial finding the target are greater than the probability of the corresponding amino acid sequence forming a functional protein. The former probability was estimated using the normal approximation to the binomial distribution, and the latter equals the equation derived above from mutation studies. As an example, the percentage of target sequences in the model’s search space, where L=1000, is 1 in 1074, while the upper bound to the percentage of functional proteins in the amino acid search space is (2/3)500 = 1 in 1088. For this example, the model estimates that functional proteins are a hundred trillion times more likely than estimated by the derived equation, which already greatly overestimates actual probabilities. As a consequence, the Harvard study’s numerical results significantly underestimate the actual timescales required to generate novel proteins.
Despite this bias, the article reports for L=1000 that the average number of trials required to find the target exceeds 1065. Accordingly, the chances of any organism evolving a new protein of comparable length (500 amino acids) in all of Earth’s history is much less than 1 chance in a trillion trillion. In addition, the numerical data on the exponential growth of the timescales can be interpolated to demonstrate that the chance of any novel protein much longer than 250 amino acids appearing would be miniscule.
Here Lies the Challenge
As alluded to already, the single evolutionary step of adding the flagellar filament requires the creation of the genes for the filament (FliC), the assembly cap (FliD), and the two joint-proteins (FlgK and FlgL) along with the genes’ regulatory regions. The cap is not essential in one special group of bacteria, but it is believed to have been essential in the hypothetical common ancestor to all flagella. Each of these proteins is so highly specialized for its role in filament construction that it could not possibly serve any other cellular purpose.
One can now assess for the addition of the filament a minimum required timescale. The number of amino acids associated with each protein are as follows: FliC — 498, FliD — 468, FlgK — 547, FlgL — 317. Their lengths all exceed the limit for a target that could ever be found in the entire history of the Earth, and most exceed the limit significantly. Compounding the challenge, all of the proteins are required before the filament can properly assemble, which dramatically increases the disparity between the available and the required waiting time.
Conversely, one might start by assuming that all four proteins could form within a few billion years. The fact that timescales grow exponentially with the number of coordinated proteins would then constrain the average time for the appearance of an individual protein in standard laboratory studies of microorganisms to less than a decade and in nature to less than a day. This result clearly does not match reality since no evidence exists that any new protein has recently appeared in any species.
The Harvard article is not unique in undermining virtually all possibility for evolution to generate complex adaptations. A completely different analysis by Doug Axe demonstrated that the maximum number of coordinated mutations (i.e., all are neutral until the final one appears) that could accumulate in any organism is six. Therefore, not only would the origination of a novel protein prove unfeasible, but even properly integrating a single flagellar gene’s regulatory region into the assembly process would be exceedingly problematic. As a consequence, adding just the filament through undirected evolution is mathematically implausible.
First Rule of Adaptation
The challenges for evolving the flagellar filament are actually far greater than described. The addition of any combination of the needed proteins to the genome would often result in a selective disadvantage until all four arrive with properly coordinated regulatory control regions. For instance, if the filament, joint, or assembly cap proteins were not all available precisely when needed, the filament structure would fail to assemble. Moreover, any protein not manufactured at the correct time would either fail to enter the transport gate, or it would enter at the wrong time and interfere with the assembly of other parts. In cases where the filament protein (FliC) were produced in abundance, the cell would waste significant resources. The outcome would often be for mutations to disable filament-associated genes as described by Behe’s First Rule of Adaptation. Much of the process would then have to start again from scratch.
As a consequence, all of the genes would have to appear and become fully operational very quickly. Otherwise, degradative mutations would start to take over the population within a period of months, as demonstrated by experimental data on E. coli under selective pressure. Or, if the filament existed before the hook, the hook’s developmental path would face equally daunting obstacles. And, the same holds true for most of the other evolutionary steps.
All evidence points to the conclusion that the formation of the flagellum requires vast quantities of new genetic information (e.g., new functional genes) to appear in a geological instant while the time required for any undirected process to generate the needed amount would exceed the age of the universe by hundreds of orders of magnitude. Similar insurmountable hurdles face the vast majority of attempts to invoke cooption to explain other irreducibly complex innovations. As biology research advances, the challenges faced by the cooption explanation appear increasingly intractable.
Editor’s note: This post has been updated.
Image: Bacterial flagellar motor, from Unlocking the Mystery of Life, Illustra Media.