More Backstory on Our First-Couple Paper: Why Wasn’t This Done Before?

A few days ago my co-author Ola Hössjer and I published a paper in the journal BIO-Complexity, announcing that it was possible for all humans to have come from two first parents.

First a note. I speak for myself here. Any errors are my own.

Our Reason Was Plain

The reason we wrote the paper, “A Single-Couple Human Origin Is Possible,” was plain. For the last forty years, population geneticists have repeatedly said that our population was never smaller than several thousand. They said that over the course of six million years, the population had an average effective size of ten thousand, some as many as one hundred thousand. The claim by some has been that we have too much genetic variation to have been produced by mutation and maintained or increased by drift in just six million years. These numbers were based on a variety of methods, some of them quite sophisticated.

The one thing they never did was to test whether it actually was possible to start from two directly. They never ran a forward model starting from two to determine if they could duplicate current genetic diversity. There are two reasons, I think, for their having failed to do so. The first may have been methodological. Running forward models is very memory-intensive and requires some clever modeling and programming and/or access to powerful computers. With the development of cloud computing, access to computational resources was not a problem anymore. The clever modeling was a matter of will. They did not test their own thinking because they did not believe it was necessary to do so. They believed that starting from two was useless.

Steve Schaffner of the Broad Institute, on the other hand, did not doubt that the forward model would succeed. He has an instinctive sense of the numbers and predicted that above 500,000 years would be enough for his model to match the allele frequency spectrum of the 1000 Genomes Project. For him, the issue of the plausibility of a sudden bottleneck scenario was why he had never bothered to do it until provoked by the Venema/Buggs debate.

What Venema Missed

Why did most think a bottleneck of two was impossible? I can’t say for sure, but at least in Dennis Venema’s case it seemed to be his confidence in the many reported effective population sizes of 10,000. What he missed was the fact that averaging over millions of years would mask a sudden sharp bottle neckdown to two individuals if recovery is rapid.

In any case, we started our modeling work in 2014, and it was published in 2016 as a model description. In a post earlier this week (“A First Couple? Here’s the Backstory”), I described what happened in 2017 that broke open the first-couple debate. Both Joshua Swamidass and Richard Buggs were aware of and interested in our ongoing modeling work at that time. I followed what they and Venema were doing with interest also.

Why Not ARGweaver?

Now over at Peaceful Science they are discussing this paper. I am pleased to see it. And I can even answer a few questions right away, such as why we didn’t use a more powerful tool such as ARGweaver.

The answer is simple: built-in evolutionary assumptions. We tried to keep our math as free as possible from assumptions about evolutionary history. That is also why we didn’t use the derived frequency spectrum, which would have entailed comparing chimp and human sequences to determine which human allele was likely to be ancestral and which mutant (using the chimp as the outgroup). If no evolutionary relationship is there, what is the point in using chimps as an outgroup? It is permitted, in any case, to use a folded allele frequency spectrum in cases where no appropriate outgroup exists. Whether one exists is actually one of the issues under consideration.

There are multiple directions we could go. We may use something like ARGweaver in future papers, but we would want to compare its results with what we have now. We could test the effect of population structures, migration, and subdivision on effective population size and the time required to match current genetic diversity.

For those who wonder why we did not include selection, it is because of the structure of the model itself, which does not allow for it at present. We plan to include selection, but that will require “rewiring” the program. We are aware that selection is an important feature that must be examined.

The business about coalescent models being inaccurate at small sizes — note that they are inaccurate as compared to forward models, which is our main model. Note also that they all are being compared to effective population sizes, which are notoriously difficult to estimate.

For Joshua Swamidass: Actually, we mentioned HLA, which is home to many examples of putative trans-species polymorphisms. This an area I would like to take up because I think it is the way forward and may provide some answers to our origin that other regions of the genome can’t. More later.

As for ghost lineages, as far as I can tell they will shift things further in the past. No problem. Remember, I’m OK with 2 million years!

Defining Parsimony

Finally, a word about parsimony and straining at gnats. Of all the things I imagined the paper might be criticized for, this is one I had not imagined. “Parsimonious” was intended to mean having the fewest possible special conditions. That’s all. Standard mutation rate, growth rate, standard generation time, and final population size. The only things heterodox in the model are the introduction of primordial diversity to the SCO model and starting from two.

Nowhere do we say that parsimony makes SCO right. We just ask the reader to consider that models with fewer assumptions are more likely to be closer to the actual situation. If we loaded the model with lots of special conditions we could have brought the time down considerably. For example, increasing the mutation rate, currently about 100 bp per diploid generation to four times that, or 400 bp per diploid generation, would drop the time to the first couple to 6,000 years ago…but what would it take to increase the mutation rate that much? Mortality would be so high we might go extinct!

Just Possible

So why make “parsimonious” a threatening word? Maybe I should have said “simple” instead? I think they all missed the point of the title. Our whole point was not that the single-couple model was true or to be preferred, but merely possible. Just possible. 500,000 years ago, or 2 million years ago, with ad-mixture, without, still possible. Unique or from a bottleneck, take your pick, the model doesn’t distinguish. All we set out to show was that it was possible to account for current diversity starting from two. Which, by the way, Josh Swamidass has kindly reminded me, he has already done. Now we have done it a second way.

Image: Adam and Eve, by Tintoretto [Public domain], via Wikimedia Commons.