DNA Storage Goes Biological

Four years ago, we discussed news about the prospect of using DNA for computer storage. It would be ideal, IT engineers said: DNA is plentiful, cheap, stable, and capable of enormous information density. All the computer data in the world could be stored in one kilogram of DNA. At the time, there were challenges to its practical application, especially going from a silicon medium to a biological medium and back again.

Engineers in 2017 felt that the challenges were tractable, but DNA storage has still not reached widespread application. Part of this might be inertia; it’s difficult to displace working technologies that seem to suffice for most needs. Another challenge would be to build “wet” infrastructure for long-term storage of DNA.

Now, finally, some progress has been reported from Columbia University. Nature writes of “Tiny hard drives that are alive — and multiplying.” Specifically, “A common bacterium can be engineered to carry coded messages in its genome.”

Data can be stored at a high density in DNA molecules, but most current storage methods rely on DNA synthesized in the laboratory. As an alternative, Harris Wang and his colleagues at Columbia University in New York City designed Escherichia coli bacteria that, when zapped with electricity, make distinctive changes to their genomes. The ‘electricity on’ and ‘electricity off’ genomic patterns serve as the equivalent of the 1s and 0s used in digital computers. [Emphasis added.]

The Columbia team used their new technique to write “hello world” in the bacterial genomes. Use of live bacteria has several advantages. For one, the designed genetic sequences are heritable; daughter cells continue to carry the coded information. Another advantage is that the information is not degraded by dirt and other contaminants. Dirt is like home to germs. And since the information is genetically barcoded, the data can be reconstructed even if the E. coli are mixed with other species of bacteria.

That last fact opens up ideas for applications in steganography, or data concealment: hiding messages in plain sight. A plot for a spy novel comes to mind: Agent 008 scrapes dirt off shoes, reads instructions from headquarters. Nobody would expect complex specified information in dirt!

CRISPR Critters

Parallel to advances in DNA storage has been the gene editing revolution brought about by the discovery of nature’s own gene editor, the CRISPR-Cas9 gene cleavage system. Bacteria use the system to target foreign DNA and cut it out of their genomes. Around 2012, scientists realized they could use CRISPR to target and cleave specific genes in almost any organism. The GMO revolution took off.

Columbia’s team built a modified CRISPR system that responds to differences in redox conditions. Their paper in Nature Chemical Biology¹ describes how this allowed them to write directly into the E. coli genome, without having to sequence the target message in the lab. The result is a “direct digital-to-biological” method for writing computer information into DNA. The technique is robust and heritable, they say:

DNA has been the predominant information storage medium for biology and holds great promise as a next-generation high-density data medium in the digital era. Currently, the vast majority of DNA-based data storage approaches rely on in vitro DNA synthesis. As such, there are limited methods to encode digital data into the chromosomes of living cells in a single step. Here, we describe a new electrogenetic framework for direct storage of digital data in living cells. Using an engineered redox-responsive CRISPR adaptation system, we encoded binary data in 3-bit units into CRISPR arrays of bacterial cells by electrical stimulation. We demonstrate multiplex data encoding into barcoded cell populations to yield meaningful information storage and capacity up to 72 bits, which can be maintained over many generations in natural open environments. This work establishes a direct digital-to-biological data storage framework and advances our capacity for information exchange between silicon- and carbon-based entities.

A diagram in the paper shows both read/write operations in the system. There is a one-to-one correspondence between bits in the digital computer and artificial codons in the DNA.

Data Security

The ability of bacteria to propagate themselves with the message alleviates concerns about data loss from accidents. Since bacteria replicate naturally, they come with a built-in backup system. Scientists can simply carry vials of the daughter cells to another location in case of fire or disaster. The old way to do offsite backup was to tediously copy data to magnetic tapes overnight, then haul them in trucks to secure vaults in the morning. Think how much easier DNA offsite storage could be. If a kilogram of DNA could contain the world’s data, a bicyclist could probably transport his company’s data in a container the size of a pinhead. Better yet, the “silicon-based” information could be transferred over the internet to another “carbon-based” system across the country. And unlike magnetic tape, the medium would not have to be returned for recycling. Just re-encode another batch of E. coli bacteria, which are ubiquitous. Only the read/write system would need to be available at backup sites.

Implications for ID

This story recalls important principles about intelligent design. One is that information is independent of the medium that conveys it. Consider a few simple cases of messaging via animals. Suppose a farmer trains his horse to shake its head one way in response to one question, and another way in response to another question. He sends the horse trotting off to the neighbor’s house, and the neighbor, knowing the code, gets the answer. The horse, hoping for a carrot, would be oblivious to any meaning in the sounds or the head shakes. A parrot might be able to talk a short message in human language, but it, too, would be clueless about what the sounds mean. A passenger pigeon could carry a piece of mail in its beak. As far as we know, in examples such as these (and sorry for those who think Lassie understood the words Timmy was telling the dog to bark to his mother), only human intelligence is capable of discerning and understanding semantic information coded in words and symbols. Certainly, bacteria do not. The point is that any animal could conceivably play the role of E. coli in the CRISPR storage system, but the bacteria, being microscopic, are more convenient for carrying large amounts of data.

Pursuing this thought, even the bacterial DNA is oblivious to the meaning of its code. The natural sequences (e.g., genes) and the artificial sequences input by the engineers mean nothing to nucleotides, just as sequences of bits mean nothing to the integrated circuits passing them along. Only the minds of the designers and users know what they “mean.” Semantics is orthogonal to sequence.

The presence of a translation system is also indicative of intelligent design. The above illustration of a horse shaking its head shows why; the horse would hear a completely different sound if the owner spoke the same question in Mongolian or French. It could be trained to do the same head shake, conveying the same message, to any human intelligence knowing the code, even if the owner spoke English and the neighbor spoke Chinese. In fact, there could be a chain of different animals trained to convey the message without understanding it: horse to parrot to passenger pigeon to dog to neighbor. William Dembski calls this property the “multiple realizability” of information.²

For another example, think of a man writing “I love you” in the sand on a California beach. A friend takes a picture of it and sends it to a skywriting company. The pilot writes it in the sky, and the wife gets heart flutters in New York. The same message has passed from mind to sand to camera sensor to mechanical keyboard to integrated circuits to space (if the bits pass through a satellite link) to LEDs on a screen to airplane to vapor trails in the sky to the wife’s retina to her brain. The same message has been conveyed in multiple media, proving that information is non-physical. It’s fair to ask, what (or who) put the information into E. coli’s DNA?

Non-Human Translation

The above examples of translation schemes are products of human design. What would a materialist-minded observer think of a non-human instance of a translation system? He might propose the hypothesis of a feedback loop, of which there are many examples in nature. In climate science, for instance, rising ocean temperatures can create clouds that lower the temperature (negative feedback). Natural radioactivity could reach a critical point to initiate nuclear fission (positive feedback). Natural proxies might also be alleged as translation systems. Depending on seasonal flow, some cave streams create natural siphons that flush water periodically to the outside, influencing the outside river environment. A geologist might consider the effects encoded in river sand as a proxy for the season. The problem with those proposals for information translation is that they have no semantic meaning, except for the human intelligence interpreting it. They also lack enough complex specified information to pass the design filter.

There is a remarkable translation system at the core of each living cell that meets the requirements for complexity and specificity: the DNA translation system, where information in the DNA code (nucleotides) is translated into functional information in the protein code (amino acids). Nucleotides and amino acids are carriers of information but not originators of information. The two codes unite in machines that relate both codes together: the family of twenty aaRS-tRNA synthetases which read a nucleotide codon on a transfer-RNA and fasten the appropriate amino acid to it on the other end. The codon bears no semantic relationship to the amino acid. AGC does not “mean” serine; it does not look like serine; it does not act like serine. The synthetase puts the two together because that is the convention in a universal genetic code.

A Hallmark of Design

A language convention is a hallmark of intelligent design. The ribosome then reads the nucleotide code like a paper tape and fastens the amino acids together into a protein. Input DNA information; output protein information. None of the molecules “know” what is going on; they just operate because they were designed to follow instructions embedded from outside the system.

To relate this reality to the Columbia team’s bacterial storage system, the nucleotides are like the message that the human engineers decided to send and wrote down. The proteins are like the sequence ending up in the bacterial genome. None of that has any meaning to the molecules involved. Only the human mind at the start and the human reader at the end understand the meaning; the rest is just mechanics.

When a translation system like DNA-to-protein is observed, therefore, a mind can be inferred prior to the DNA message. If the desired result is a functional protein at the end of the line, an intelligent designer had to have had the foresight to engineer a chain of machines that would both store the message in a stable molecule (DNA) and also translate it into very different molecules (amino acids), whose particular sequence could perform the work needed. Since it is clear that an intelligent designing mind was needed at the front end of Columbia’s DNA storage system, an intelligent designing mind was also needed at the front end of the DNA information-translation system.

Food for thought: If a biologist sequenced the Columbia team’s bacteria, and didn’t know about the region containing the message, would they consider it junk DNA? If told it is an embedded message, would they expect that random mutations would improve the message or upgrade it into a message with more semantic information?

Notes

Yim, S.S., McBee, R.M., Song, A.M. et al. Robust direct digital-to-biological data storage in living cells. Nat Chem Biol (2021). https://doi.org/10.1038/s41589-020-00711-4.
Dembski, William. Being as Communion (Ashgate, 2014), p. 100.

Evolution News_{& Science Today}

Intelligent Design