COVID-19 Meets Intelligent Design

The fundamental scientific question regarding the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), responsible for the current global COVID-19 pandemic, is from where did it come? This question is no mere academic exercise. Its answer can illuminate not only how better to combat the virus, but how to protect society from future bio-threats. Therefore, this question has received substantial attention.

From the beginning in early 2020, this work has included a heavy dose of design detection. The seminal study, which was performed by an international collaboration of five scientists, appeared early in 2020 in the journal Nature Medicine. It concluded that the virus was not designed. It would be difficult to find a more prominent and important application of the design inference. It would also be difficult to find a more flawed and catastrophic application of the design inference. In early April 2020, I documented the following review of the study.

The research team concluded that SARS-CoV-2 originated naturalistically rather than via laboratory manipulation using a Bayesian approach. Bayes’ theorem, named after the 18th-century Reverend Thomas Bayes, computes the probability of an event given some new evidence. While students will recall using Bayes’ Theorem to solve problems involving urns and colored balls, in fact it has a wide range of practical, real-world applications.

Quite a Bit Trickier

Bayes’ theorem has also been used to evaluate hypotheses and, in particular, theories of origins. But in origins problems, the use of Bayes’ theorem is quite a bit trickier. For example, one needs to know the prior probability of the hypothesis — that is, the probability of the hypothesis before the new evidence was obtained. One also needs to know the probability of the new evidence. And finally, one needs to know the conditional probability of the new evidence, given that the hypothesis is true.

Given these three quantities, Bayes’ theorem can then be used to compute the conditional probability of the hypothesis, given the new evidence. This may seem straightforward, and the calculation certainly is, but the required probabilities are anything but. For example, imagine using Bayes’ theorem to compute the probability that the theory of evolution is true. Where would you find the prior probability of evolution? Or again, say your evidence is some new fossil finding. Where would you find the probability of that fossil, let alone the conditional probability of the fossil, given evolution?

Bayesian approaches are incredibly useful in practical problems in science, engineering, operations, manufacturing, and so forth. But in origins studies, this practical and straightforward method is suddenly more difficult and fraught with danger. One can easily produce unjustified or otherwise erroneous results.

Remedying the Problem

Various strategies attempt to remedy this problem of using Bayes’ theorem in theory evaluation and origins studies. For example, rather than compute the probability of a theory being true, one strategy compares two opposing theories. In this approach, the two opposing theories are compared to each other, and fewer probabilities are required at the start.

The cost of this simplification is the assumption that the two opposing theories are complementary. That is, one must be true, and the other must be false. There can be no other possibilities. But usually it is impossible to know this. For example, imagine using this Bayesian approach to compare the theories of evolution and design. These two opposing ideas must be the only possibilities, as no alternatives or intermediates are allowed.

But how can we know these are the only possibilities? Hybrid theories, involving some combination of evolution and design could be physically possible. Or some other theory, that we haven’t even thought of yet, could be possible. In fact, an unfortunate but very real problem is the use of Bayesian approaches to rig the calculation in one’s favor. The probabilities can be arbitrarily adjusted, and a strawman can be used for the opposing theory.

Thus, Bayesian approaches are fraught with various pitfalls, including unjustified prior probabilities, unjustified opposing theories, and unjustified assumptions about complementarity. With these issues in mind, I am always interested to see how scientists use Bayesian approaches for theory evaluation. Will they avoid, or fall prey, to the pitfalls?

Implicitly Bayesian

This brings us to the SARS-CoV-2 study, whose lead author was Kristian G. Andersen. The paper is not very long, and the section on their investigation of the virus origins is even shorter. Furthermore, their theory evaluation method (i.e., their method for determining the probability of the theory that the virus originated naturalistically) was not straightforward because nowhere did they provide an overview of their method. In other words, nowhere does the paper mention Bayes’ theorem or that they use a Bayesian approach. It is implicit.

Well perhaps that can be forgiven. After all, Bayesian approaches are rather obvious once one sees the equations in use. But here we find another concern: there are no such equations in the paper. Again, it is implicit.

Well again, perhaps it was an innocent oversight, or required redaction due to space limitations. After all, the prior probabilities themselves contain the important information, and given them one can probably reconstruct the approach and equations.

But here again, we find yet another concern. Not only is Bayes nowhere mentioned, and nowhere is the particular Bayesian approach mentioned, and nowhere are any equations given, but in fact, nowhere are any probabilities given.

In fact, the rationale and explanation for their rather important conclusion is remarkably terse. The paper uses two pieces of evidence to argue against the theory that the virus arose via laboratory manipulation. And their rationale amounts only to a few sparse sentences, which I quote here. First, we have:

While the analyses above suggest that SARS-CoV-2 may bind human ACE2 with high affinity, computational analyses predict that the interaction is not ideal and that the RBD [receptor-binding domain] sequence is different from those shown in SARS-CoV to be optimal for receptor binding.

The argument boils down to this: The receptor-binding domain (RBD) in the SARS-CoV-2 spike protein binds with high affinity to the human ACE2 receptor, and in other species with high ACE2 similarity (sic, the paper erroneously refers to such high similarity as “high homology”), but that computational analyses fail to predict this, and instead predict “that the interaction is not ideal.” Therefore, they reason that the laboratory manipulation hypothesis is less likely because under that hypothesis, the observed SARS-CoV-2 RBD sequence would not have been designed. Instead, a designer would have selected a sequence with stronger predicted binding.

How a Designer Would Act

As you can see, this reasoning is fraught with unjustified assumptions about how a designer would have acted. Not only does this reasoning assume a designer would attempt to maximize the binding strength but as Nicholas Wade, one-time New York Times science writer, explains:

But this ignores the way that virologists do in fact get spike proteins to bind to chosen targets, which is not by calculation but by splicing in spike protein genes from other viruses or by serial passage. With serial passage, each time the virus’s progeny are transferred to new cell cultures or animals, the more successful are selected until one emerges that makes a really tight bind to human cells. Natural selection has done all the heavy lifting. The Andersen paper’s speculation about designing a viral spike protein through calculation has no bearing on whether or not the virus was manipulated by one of the other two methods.

The second argument is equally weak:

The second notable feature of SARS-CoV-2 is a polybasic cleavage site (RRAR) at the junction of S1 and S2, the two subunits of the spike. … Polybasic cleavage sites have not been observed in related “lineage B” betacoronaviruses, although other human betacoronaviruses, including HKU1 (lineage A), have those sites and predicted O-linked glycans. … if genetic manipulation had been performed, one of the several reverse-genetic systems available for betacoronaviruses would probably have been used. However, the genetic data irrefutably show that SARS-CoV-2 is not derived from any previously used virus backbone.

“[W]ould probably have been used”? We are supposed to believe that this lax argument disproves the laboratory design hypothesis? There are powerful ways to use the design hypothesis. This is not one of them.

The claim that “the genetic data irrefutably show that SARS-CoV-2 is not derived from any previously used virus backbone” means very little. The importance of this evidence is contingent on the assumption that a designer would have used such a “previously used virus backbone.” The authors can only assert that this is probable, but this is silly. Again, as Wade explains:

Only a certain number of these DNA backbones have been described in the scientific literature. Anyone manipulating the SARS2 virus “would probably” have used one of these known backbones, the Andersen group writes, and since SARS2 is not derived from any of them, therefore it was not manipulated. But the argument is conspicuously inconclusive. DNA backbones are quite easy to make, so it’s obviously possible that SARS2 was manipulated using an unpublished DNA backbone.

Neither of the two evidences is particularly compelling. And this is reflected in the tentative language used, such as “most likely,” “It is improbable,” and “would probably.”

Tentative Language Turns to New-Found Confidence

But when we reach the Conclusions section this tentative language gives way to a new-found confidence: “the evidence shows that SARS-CoV-2 is not a purposefully manipulated virus,” and “we do not believe that any type of laboratory-based scenario is plausible.”

Finally, all shadow of doubt is removed in the Abstract: “Our analyses clearly show that SARS-CoV-2 is not a laboratory construct or a purposefully manipulated virus.” This claim is simply not substantiated by what they argue in the paper.

This wrapping of two weak, subjective, arguments with ersatz certainty and authority paves the way for a triumphant press release where Andersen announces, without justification, that “we can firmly determine that SARS-CoV-2 originated through natural processes.”

To summarize, the authors use a terrible design hypothesis, and make ever-escalating claims of certainty from two weak observations. The paper makes unscientific claims and should not have passed peer review.