DNA Day 2018—April 25

  • There are Now Nearly 2 Million DNA-Related Publications 
  • Cheaper Synthesis and Sequencing of DNA Drives Novel Applications 
  • Dreams of Dense Digital Information Storage in DNA Become Reality  

Before getting to the topics that I selected for this post in recognition of DNA Day on April 25, here are some numbers for DNA publications that, to me, are mind-numbing. NIH’s PubMed lists nearly 2 million DNA-related publications, with many more not counted in that database. Looking at the annual contributions to these 2 million articles reveals about 90,000 such publications in just the past year. This computes to nearly 250 per day, 365 days per year, or roughly 10 publications per hour, around the clock!

Taken from gigaom.co

Rationale for Information Storage in DNA

Long-term storage and access of information for use by future generations—think Mr. Spock’s library computer workstation in Star Trek—presents a serious challenge, regardless of data format. This challenge was noted by Bancroft et al. in 2001 in venerable Science magazine as follows:

“[Digital] data currently being stored in magnetic or optical media will probably become unrecoverable within a century or less, through the combined effects of hardware and software obsolescence and decay of the storage medium. New approaches are required that will permit retrieval of information stored for centuries or even millennia.”

These authors proposed that DNA has three properties that lend to its serving as a vehicle for long-term information storage:

  • Storage of DNA can result in extremely long stability, as evidenced by the reported recovery of viable bacteria from 250-million-year-old salt crystals.
  • Because DNA is our genetic material, methods for both storage and reading of DNA-encoded information should remain central to technological civilizations and undergo continual improvements.
  • Use of DNA as a storage can use an enormous number of identical molecules, thus providing extensive informational redundancy to strongly mitigate effects of any losses due to chemical degradation.

Taken from newsweek.com

DNA is an incredibly dense storage medium, potentially squeezing in a mind-boggling 5.5 x 1015 bits (petabits, Pb) or 125,000 x 109 bytes (gigabytes, GB; 1 byte = 8 bits) of information per cubic millimeter. By that measure, it has been estimated that all 700 x 1018 bytes of today’s accessible internet would fit into a space the size of a shoebox!

Demonstrations of Information Storage in DNA

According to a report in Science in 2012 by a team including uber-famous George Church, storing messages in DNA was first demonstrated in 1988, and the largest project following that achieved encoding only 7920 bits in 2010. In comparison, Church and coworkers introduced new technology to demonstrate storage of ~5,300,000 bits of information contained in a book that included 53,426 words, 11 JPG images and 1 JavaScript program. This information was encoded onto ~55,000 159-nt oligos synthesized by ink-jet printing on DNA microchips, which I’ll comment on below. To read the encoded book, PCR was followed by high-throughput sequencing. Interested readers can consult this report for details and a discussion of why this approach offers numerous advantages over previous strategies.

Shortly thereafter in 2013, a team at Wellcome Trust Genome Campus working with Agilent Technologies received widespread media attention for a report on encoding in DNA, and then reading, the entire set of Shakespearean sonnets, a 26-second clip of Martin Luther King’s famous “I have a dream” speech, and a photograph using an approach depicted here.

Fast-forward to March 2018 and we are brought to a publication by Organick et al. in Nature Biotechnology that received loads of media coverage because of its landmark achievement demonstrating random access of information in large-scale DNA data storage. This team from various institutions, including Microsoft, noted that recovering stored data on a large-scale currently requires all the DNA in a pool to be sequenced, even if only a subset of the information needs to be extracted. In contrast, they demonstrated a methodology to encode and store 35 distinct files [over 200 x 106 bytes (megabytes; MB) of data]—including video, audio, images, and text—in more than 13 million DNA oligos. Moreover, they showed recovery of each file individually with no errors, using a random-access approach based on sequence tags for files for specific PCR amplification. Interested readers can consult the paper for details.

Future Prospects

Organick et al. conclude that synthetic DNA production efficiency will have to significantly increase if DNA is to become a practical medium for data storage. They contend that this will be attainable because the synthetic DNA needed for data storage can be much more error prone than DNA required by life sciences, and very few copies per sequence are required (i.e. orders of magnitude less DNA than conventional nanomole scale solid-phase synthesis). This is due to error-correcting algorithms such as the one they described in their paper.

While ink-jet printed DNA oligos using Caruther’s phosphoramidite chemistry can be scaled and improved, template-independent enzymatic oligonucleotide synthesis (TiEOS) is receiving attention because of potential cost reductions. For example, Michael Jensen and uber-famous Ron Davis at Stanford, in a very recently published review, have estimated costs for phosphoramidite vs. TiEOS methods. They concluded that cyclical two-step (couple, deblock) solid-phase femtomole-scale synthesis of DNA oligos by TiEOS with dNTPs (X = H) with 3’-protecting (Pr) groups and terminal deoxynucleotide transferase (TdT) could provide oligos to users at a lower cost by orders of magnitude per nucleotide incorporated.

Taken from Jansen & Davis Biochemistry (2018)

Regarding sequencing of DNA to retrieve stored information, Organick et al. used conventional Illumina reversible terminators as well as newer Oxford Nanopore Technologies’ nanopore sequencing. They noted that “[t]he compactness and potential for scalability makes nanopore-based sequencing an intriguing option for integration in future stand-alone DNA data storage systems”.

I’m optimistic about the future prospects for using DNA to store information. Whether or not you share that optimism, I hope you’ll agree with my opinion that exploratory research toward that goal is yet another example of DNA’s wondrous properties inspiring quests for new technologies.

As usual, your comments are welcomed.

DNA Day Archive

Here are the topics and links to all of my previous blogs on DNA Day for you to reread or check out. Enjoy!

2013—60th Anniversary of the Discovery of DNA’s Double Helix Structure

2014—My Top 3 “Likes” for DNA Day

2015—Celebrating Click Chemistry in Honor of DNA Day

2016—DNA Dreams Do Come True!

2017—Some of the Top 5 Cited Papers on DNA Will Surprise You


Reproduction of the following announcement in this TriLink BioTechnologies blog was requested of Jerry Zon by Yogesh Sanghvi on behalf of the International Society of Nucleosides, Nucleotides & Nucleic Acids (IS3NA). The stated aim of the IS3NA is to capitalize on the knowledge of practicing members across several disciplines to understand the impact of nucleic acids in a plethora of cutting edge scientific questions ranging from the origins of life to the development of novel therapeutics. Importantly, in my opinion, the IS3NA also aims to act as a mediator for communication, cooperation and understanding between scientists of all nationalities, and to provide information about and to stimulate interest in the above-mentioned areas among persons of all nationalities.