The world is facing a data storage crisis. As information proliferates in everything from YouTube videos to astronomical images to emails, the need for storing that data is growing exponentially. If trends continue, data centers will have used up the world’s microchip-grade silicon before 2040.
But there is another storage medium made of abundant atoms of carbon, hydrogen, oxygen, nitrogen, and phosphorus. It’s called DNA. And you wouldn’t need much of it. The entire world’s data could be stored in just one kilogram of the stuff. So says Andy Extance in an intriguing article in Nature, “How DNA could store all the world’s data.”
For Nick Goldman, the idea of encoding data in DNA started out as a joke.
It was Wednesday 16 February 2011, and Goldman was at a hotel in Hamburg, Germany, talking with some of his fellow bioinformaticists about how they could afford to store the reams of genome sequences and other data the world was throwing at them. He remembers the scientists getting so frustrated by the expense and limitations of conventional computing technology that they started kidding about sci-fi alternatives. “We thought, ‘What’s to stop us using DNA to store information?’“
Then the laughter stopped. “It was a lightbulb moment,” says Goldman, a group leader at the European Bioinformatics Institute (EBI) in Hinxton, UK. [Emphasis added.]
Since that day, several companies have begun turning this “joke” into serious business. The Semiconductor Research Corporation (SRC) is backing it. IBM is getting on board. And the Defense Department has hosted workshops with major corporations, which is sure to lead to funding. The UK is already funding research into next-generation approaches to DNA storage.
When you look at Extance’s chart, it’s easy to see why DNA is “one of the strongest candidates yet” to replace silicon as the storage medium of the future. The read-write speed is about 30 times faster than your computer’s hard drive. The expected data retention is 10 times longer. The power usage is ridiculously low, almost a billion times less than flash memory. And the data density is an astonishing 1019 bits per cubic centimeter, a thousand times more than flash memory and a million times more than a hard disk. At that density, the entire world’s data could fit in one kilogram of DNA.
As with any new technology, baby steps are slow. Technicians face challenges of designing DNA strands to encode data, searching for it, and reading it back out reliably. How does one translate the binary bits in silicon into the A, C, T, and G of nucleic acids? Can DNA strands be manufactured cheaply enough? How can designers proofread the input?
Living things, though, have already solved these issues. After all, “a whole human genome fits into a cell that is invisible to the naked eye,” Extance says. As for speed, DNA is accessed by numerous molecular machines simultaneously throughout the nucleus that know exactly where to start and stop reading. Genomic machinery in the cell proofreads errors to one typo per hundred billion bases, as Dr. Lee Spetner notes in his book Not by Chance! That’s equivalent, he says, to the lifetime output of about 100 professional typists.
Life shows that it is possible in principle to overcome these challenges. That gives hope to the engineers on the cutting edge of DNA storage. Already, several experimenters have succeeded in encoding information in DNA. By 2013, EBI had encoded Shakespeare’s sonnets and Martin Luther King’s “I have a dream” speech. IBM and Microsoft topped that 739-kilobase effort shortly after with 200 megabases of storage. As far back as 2010, Craig Venter’s lab encoded text within the genome of his synthetic bacterium, as Casey Luskin reported here. Everything alive demonstrates that DNA is already the world’s most flexible and useful storage medium. We just need to learn how to harness the technology.
Goldman’s EBI lab and other labs are thinking of ways to ensure accuracy. One method converts bits into “trits” (combinations of 0, 1, and 2) in an error-correcting scheme. Engineers are sure to think of robust solutions, just like the pioneers of digital computers did with parity bits and other mechanisms to guarantee accurate transmission over wired and wireless communications.
How long could DNA storage last? That’s another potential advantage — better than existing technology by orders of magnitude:
…these results convinced Goldman that DNA had potential as a cheap, long-term data repository that would require little energy to store. As a measure of just how long-term, he points to the 2013 announcement of a horse genome decoded from a bone trapped in permafrost for 700,000 years. “In data centres, no one trusts a hard disk after three years,” he says. “No one trusts a tape after at most ten years. Where you want a copy safe for more than that, once we can get those written on DNA, you can stick it in a cave and forget about it until you want to read it.”
With these advantages of density, stability, and durability, DNA is creating a burgeoning field of research. Worries about random access are already being overcome. With techniques like PCR and CRISPR/Cas9, we can expect that any remaining challenges will be solved. Look at what our neighbors at the University of Washington recently achieved:
As a demonstration, the Microsoft-University of Washington researchers stored 151 kB of images, some encoded using the EBI method and some using their new approach, in a single pool of strings. They extracted three — a cat, the Sydney opera house and a cartoon monkey — using the EBI-like method, getting one read error that they had to correct manually. They also read the Sydney Opera House image using their new method, without any mistakes.
Market forces drive innovation. The promise of DNA storage is so attractive, funding and capital are sure to follow. DNA synthesizing machines will come. Random-access machines with efficient search algorithms will be invented. Successes and new products will drive down prices. As with Moore’s Law for silicon, the race for better DNA storage products will accelerate once it moves from lab to market. Extance concludes:
Goldman is confident that this is just a taste of things to come. “Our estimate is that we need 100,000-fold improvements to make the technology sing, and we think that’s very credible,” he says. “While past performance is no guarantee, there are new reading technologies coming onstream every year or two. Six orders of magnitude is no big deal in genomics. You just wait a bit.“
So, here we have the best minds in information technology urgently trying to catch up to storage technologies that have been in use since life began. They’re only a few billion years late to the party. The implications are as profound as they are intuitive.
Speaking of intuition, Douglas Axe in his recent book Undeniable: How Biology Confirms Our Intuition That Life Is Designed defines a quality he calls functional coherence: “the hierarchical arrangement of parts needed for anything to produce a high-level function — each part contributing in a coordinated way to the whole.” He writes:
No high-level function is ever accomplished without someone thinking up a special arrangement of things and circumstances for that very purpose and then putting those thoughts into action. The hallmark of all these special arrangements is high-level functional coherence, which we now know comes only by insight — never by coincidence.
Scientists are seeking to match the same level of functional coherence that can be observed every second in the cells of our own bodies, and of the simplest microbes. The conclusion to draw from this hardly needs to be stated.
Photo credit: PublicDomainPictures, via Pixabay.