Population Genetics: What It Is and Why It Matters
After my recent post on Evolution News, I was asked to give a little more context and explain what population genetics is. It is a body of theory that allows us to transform knowledge about the genetics of individuals, families, and lineages; knowledge such as Mendelian inheritance of alleles, and linkage due to chromosome recombination, plus mutation and natural selection; into statistical knowledge about the genetics of populations, such as nucleotide diversity, haplotype block structure, allele frequency distributions, and correlation or linkage distributions.
In general, population genetics is not interested in what the genes mean or do, only in how they are distributed in a population. As such, it is a theory about micro-evolution in action. It is not a theory about macro-evolution, for example how a new design or body plan could arise from random mutations. It can only describe things like how long it would take for suitable mutations to occur (spoiler: often a very long time), how many of them would go extinct (spoiler: the vast majority of them), and how long it would take for suitable ones to be established (spoiler: an impossibly long time, even for some seemingly trivial cases).
Thus it does not relate directly to intelligent design. But it has become important for ID theorists to think about it. Most ID theorists believe that Darwinian evolution would not work no matter how much time you give it. This is because of principles like No Free Lunch and Conservation of Information, and some have additional reasons. However, since not everyone is yet convinced about that, some of us feel it is instructive to demonstrate in more detail why macro-evolution would not work even on timescales of millions of years. We believe population genetics is one of those areas of evolutionary science that can be really rigorous and valid, and we believe it actually strengthens our case against the Darwinian mechanism as a creative force; it shows that macro-evolution, even if it worked, would take much longer than people suppose.
On the other hand, population genetics has also been used to construct arguments against various intelligent design hypotheses. For example, if you believe in a designer, it is quite natural to believe that humans were specially created, though not all of us do. But some have charged that the distribution of genetic information in modern human genomes is incompatible with that hypothesis. If they are right, this would not prove ID wrong. But it is unexpected, so we wanted to look into it some more.
More generally, many evolutionists believe that, despite our evidence against the supposed power of natural selection, we must be wrong since there is evidence of shared micro-evolutionary patterns between wildly different species such as humans, chimps, and gorillas. These include things such as presumed ERVs (endogenous retroviruses, believed to be genetic scarring from viral infections), shared SNPs (single nucleotide polymorphisms, assumed to be due to random mutation and incomplete lineage sorting), and synteny (shared chromosome structure). Again, none of this really would prove ID wrong; there are various ID theories that are compatible with common descent (pre-programmed evolution, or a “tinkering” designer who adds genes continually to an evolving population); but again it does seem a little bit incongruent to most people. Moreover, this seems to be the main reason why people ignore or disbelieve the sound arguments that ID theorists make; they think that evidence of micro-evolution between species is evidence of macro-evolution, and therefore evidence that natural selection is potent. If you think carefully, it really isn’t, but they think it is, so let’s see if the mainstream hasn’t messed up there too. Population Genetics is important for understanding the distribution of some of these patterns.
Here are some important concepts to know, in no particular order:
Evolution is mostly random. Most people think of evolution as a process driven by natural selection, which sifts the mutations that happen continuously and selects the most beneficial ones, gradually improving the quality of the organism. However, biologists have come to realize that most mutations are nearly neutral, and that transmission of genetic information is statistically random: the expected frequency of copies of an allele in the next generation is the same, but the variance is non-zero, so it fluctuates up and/or down over many generations. This is called genetic drift.
One important consequence is that because mutations each start in a single copy, most of them fluctuate into oblivion pretty quickly. It also means that you can tell something about the age of a mutation: if it is very prevalent in the population now, you know it didn’t appear yesterday (of course, that assumes it is a mutation, and not a primordial designed polymorphism). Another important consequence is that mutations can be “fixed” without any natural selection at all. This happens faster in smaller populations (because fluctuations are bigger proportional to the size of the population) and slower in large populations, but in the end the effect is the same: most mutations that become fixed in a population are neutral, random changes.
Genes like to travel with friends. This is because genes are arranged on linear chromosomes, a little bit like a train. When a chromosome is copied, genes tend to be copied together. This means any statistical properties will be correlated. Recombination is a process that separates genes. Imagine if two trains sitting side-by-side could swap bunches of train-cars. Passengers on nearby train-cars would be more likely to travel further on the same train than those further away. This is linkage. The train-cars themselves would be haplotype blocks and the passengers on the same car would be genes that are completely linked. In theory, the statistics predicted by this process can also tell us something about how long the trains have been running chromosomes have been evolving; low correlations between nearby haplotype blocks can be readily explained if the population is old and large, but hard to explain otherwise.
Effective Population Size
In an ideal population genetics model, every individual has an equal chance of producing progeny, and in sexual populations, an equal chance of mating with every other individual of the other sex. These assumptions make the math easier to solve. However, real populations are not so simple. For example, if locusts find a nice grain field, or bacteria find a nutritious spot of grime, the population will expand exponentially. But as soon as the food runs out, the population will crash. Only a relatively few individuals will ultimately survive the crash. As far as population genetics is concerned, the vast majority of those individuals never existed; they leave no descendants and thus no trace in the genetic record. Other issues are geographical distribution or isolation: not everyone could physically get close to every other to mate. Another issue is monogamous relationships (children of the same mother have the same father), or even harem-type breeding (many children of different mothers have the same father). Fortunately, these complications can all be fudged into a parameter called the effective population size. In most cases it is smaller than the real population size, but not always, for example if the real population size changes radically. That turns out to be important.
This is maybe the coolest of all if you like math and models. It is easiest to explain with bacteria because they are (mostly) non-sexual, in which case a parent can have multiple children, but a child cannot have multiple parents. This means that going forward in time, the lineage branches; one becomes many. However, in practice we usually have data in the present and want to figure out what might have happened in the past. This means we have to reason backwards in time, and if you reverse branching in time, you get coalescence: many become one. One important consequence of coalescence is that the further you go back in time, the number of lineages that are ancestral to the present population falls quite radically. At some point, the number of lineages falls to one, and at that point, it is impossible to tell if you had a population of one, or a population of millions. This happens with humans too but you have to look at individual genes or haploblocks, as each of those can have a very different ancestry (because, viewed backwards in time, recombination separates them). This allows for some really nice efficient simulation methods, such as this one published by the journal BIO-Complexity.
Some have claimed that for humans the coalescence to one (or first coalescent) does not happen for well over a million years back in time, calling into question the traditional idea of a single-couple human origin. This seems to be an important question to many people and is worth checking. One mistake that has already been identified is that a single couple would have four alleles, producing up to four genetic lineages (per gene or block), not one. There is no reason why these should be identical (for example a designer might want diversity). A second mistake is that none of the models actually tests a model like this (single couple to billions of people) to see what happens. There may be other ways in which the models are flawed and that’s why we are looking into it.
Photo credit: Debivort [GFDL or CC-BY-SA-3.0], via Wikimedia Commons.