DNA Day 2017

  • There are Now Millions of DNA-Related Publications
  • Some of the Top 5 Cited Papers on DNA Will Surprise You
  • You Probably Won’t Guess Top 5 Most Frequently Cited

Deciding what to post here in recognition of DNA Day 2017 was just as challenging as it has been in past years, primarily because there’s so many different perspectives from which to choose. After much mulling, and several abandoned approaches, I settled on featuring DNA publications that have received the most citations, as an objective metric—not just my subjective opinions about topics I think are significant or otherwise interesting.

Before getting to the numbers of DNA-related papers and some of the most cited papers, here’s a quick recap of what was posted here in the past, starting with the inaugural blog four years ago:

2013—60th Anniversary of the Discovery of DNA’s Double Helix Structure

2014—My Top 3 “Likes” for DNA Day

2015—Celebrating Click Chemistry in Honor of DNA Day

2016—DNA Dreams Do Come True!

Explosive Growth of DNA Publications

Regular readers of my blogs will know that I frequently use the NIH PubMed database of scientific articles to find publications by searching keywords, phrases, or authors. A convenient feature of these searches is providing “results per year” that can be exported into Excel for various purposes. Some preliminary searches indicated that DNA-related articles can be indexed by either DNA or PCR, or cloning, or other terms among which sequencing was notable. The majority, however, were indexed as either DNA or PCR, which together gave nearly 1.7 million items—an astounding number. This number is even much greater since PubMed excludes some important chemistry journals, as well as patents.

Diving deeper into these numbers, I thought it helpful to look at the publication volumes and rates for DNA, sequencing DNA, and PCR through 2015 starting from 1953, 1977, and 1986, respectively. These respective dates correspond to seminar publications by Watson & Crick, Maxam & Gilbert, and Mullis & coworkers. The results shown in the following graph attest to my often stated “power of PCR” as premier method in nucleic acid research, which we’ll see again below in another numerical context.

Top 5 Cited Papers

During my perusal of the above literature in PubMed generally related to DNA, I thought it would be interesting to find, and share here, which specific papers have the distinction of being most frequently cited. Citations are not available in PubMed, but are compiled in Google Scholar, which led me to these Top 5 that are listed from first to fifth.

Frederick Sanger (1918-2013) Taken from newscientist.com

  1. DNA sequencing with chain-terminating inhibitors

Frederick Sanger, the eponymous father of the “Sanger sequencing” method published in 1977, received the 1980 Nobel Prize in chemistry for this contribution. He also received the 1958 Nobel Prize in chemistry for sequencing insulin, and is the only person to win two Nobel Prizes in chemistry. Uber-famous DNA expert Craig Venter is quoted as saying that ‘Fred Sanger was one of the most important scientists of the 20th century,’ [who] ‘twice changed the direction of the scientific world.’

  1. Analysis of relative gene expression data using real-time quantitative PCR and the 2− ΔΔCT method

Kenneth J. Livak, PhD
Taken from archive.sciencewatch.com

The most commonly used method to analyze data from real-time, quantitative PCR (RT-qPCR) experiments is relative quantification, which relates the PCR signal of the transcript of interest to that of a control sample such as an

untreated control. The derivation, assumptions, and applications of this method were published in 2001 by Livak & Schmittgen. I overlapped with Ken Livak at Applied Biosystems, which pioneered commercilaization of RT-qPCR reagents and instrumentation at the time. He is currently Senior Scientific Fellow at Fluidigm Corp.

Sir Edwin M. Southern Taken from ogt.co.uk

3. Detection of specific sequences among DNA fragments separated by gel electrophoresis

Sir Edwin Mellor Southern, FRS, the eponymous father of “Southern blotting” DNA fragments from agarose gels to cellulose nitrate filters published in 1975, is a Lasker Award-winning molecular biologist, Emeritus Professor of Biochemistry at the University of Oxford and a fellow of Trinity College. He is also Founder and Chief Scientific Advisor of Oxford Gene Technology.

  1. Prof. Bert Vogelstein, MD
    Taken from hhmi.org

    A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity

This paper by Feinberg & Vogelstein published in 1983 describes how to conveniently radiolabel DNA restriction endonuclease fragments to high specific activity using the large fragment of DNA polymerase I and random oligonucleotides as primers. These “oligolabeled” DNA fragments serve as efficient probes in filter hybridization experiments. His group pioneered the idea that somatic mutations represent uniquely specific biomarkers for cancer patients, leading to the first FDA-approved DNA mutation-based screening tests, and now “liquid biopsies” that evaluate blood samples to obtain information about underlying tumors and their responses to therapy (an area that I’ve touted in previous blogs). A technique for conveniently radiolabeling DNA restriction endonuclease fragments to high specific activity is described. DNA fragments are purified from agarose gels directly by ethanol precipitation and are then denatured and labeled with the large fragment of DNA polymerase I, using random oligonucleotides as primers. Over 70% of the precursor triphosphate is routinely incorporated into complementary DNA, and specific activities of over 109 dpm/μg of DNA can be obtained using relatively small amounts of precursor. These “oligolabeled” DNA fragments serve as efficient probes in filter hybridization experiments.

  1. Kary B. Mullis, PHD
    Taken from TED.com

    Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase

In 1988, Kary B. Mullis and coworkers (then at Cetus Corp.) published in venerable Science a method using oligonucleotide primers and thermostable DNA polymerase from Thermus aquaticus to amplify genomic DNA segments up to 2000 base pairs to detect a target DNA molecule present only once in a sample of 105 cells. Since that time, polymerase chain reaction (PCR)-related technology has evolved to now routinely enable a variety of single-cell analyses of DNA or RNA. Dr. Mullis received the 1993 Nobel Prize in chemistry for his 1983 invention of PCR, which his website says ‘is hailed as one of the monumental scientific techniques of the twentieth century.’

Top 5 Papers by Citation Frequency

While writing the above section, it occurred to me that ranking these five publications by total number of citations-to-date in Google Scholar doesn’t account for differences in the number of years between the year of publication and now. I did the math to calculate the average citation frequency per year, and here’s the totally surprising—to me—result: relative gene expression methodology published by Livak & Schmittgen is by far the most frequently cited of the Top 5, according to this way of ranking:

  1. 2001, relative gene expression, Cited by 69560 = 4,637 avg. citations per year
  2. 1977, Sanger sequencing, Cited by 32662 = 1,701
  3. 1975, Southern blotting, Cited by 21201 = 796
  4. 1988, PCR, Cited by 18785 = 671
  5. 1983, oligolabeled DNA, Cited by 21200 = 642

I should point out that, as transformative methods such as these gradually become widely recognized as “standard procedures,” researchers tend to feel it unnecessary to include a reference to the orignal publication. Consequenly, citation frequency decreases with time even though cummulative usage increases. In other words, 25 years from now average citations per year for relative gene expression will have likely decreased, and be surpassed by a new “method of the decade,” so to speak.

Prediction for the Future

This line of reasoning leads me to close with some speculation about what DNA-related technique might emerge as the next “method of the decade” that tops the above ranking by citation frequency.

My guess is that it will be Multiplex genome engineering using CRISPR/Cas systems by Zhang & coworkers that has been cited by 4145 at the time I’m writing this piece, only four years from its publication in venerable Science in 2013. Some of my blogs have already commented on various aspects of CRISPR/Cas9, which is among genome editing tools offered by TriLink.

As usual, your comments are welcomed.

60th Anniversary of the Discovery of DNA’s Double Helix Structure…Diamond Jubilee for the “Monarch of Molecules”

Welcome to my inaugural blog post!  My intention is to provide timely nucleic acid-related scientific content that is informative and, hopefully, will be of interest to a broad readership. New posts will be published biweekly so please check back often.  I encourage thoughtful commentary as well as constructive suggestions.  To find out more about me and my relationship with TriLink Biotechnologies, please visit the ‘About Jerry’ tab at the top of the page.

Considering TriLink’s focus on providing nucleic acid-based products, it seemed appropriate that this inaugural post feature the upcoming 60th anniversary of Watson & Crick’s proposed structure for DNA published in Nature on April 25, 1953. It is widely acknowledged that insights provided therein had a fundamentally transformative impact on science and society. Anyone working with DNA should take a bit of time to read this historically important publication that is freely available at Nature.com. Doing so reveals an extremely brief report—one page and one figure—with perhaps the most understated—and similarly brief, one sentence!—conclusion suggesting the structural basis for genetic encoding by DNA sequence and its replication:

watson-crick“It has not escaped our notice that the specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.”

Those interested in an annotated and illustrated autobiographical account of this making of a scientific revolution are referred to a recent book by James D. Watson, Alexander Gann and Jan Witkowski, The Annotated and Illustrated Double Helix that has already received numerous excellent reviews.  There is also a YouTube video of a delightful presentation by Dr. Watson that is well worth watching to see and hear the story of events leading to the 1953 publication and subsequent recognition that DNA is transcribed into mRNA for translation into protein. This new knowledge coupled with the availability of various molecular biological “tools,” and initial advances in oligonucleotide synthesis, set the stage for an exciting race during the late 1970s to synthesize for the first time a medically important human gene—insulin—involving three main groups—Harvard University, University of California at San Francisco, and City of Hope National Medical Center. This fascinating story involving science, egos, and the beginning of bio-venture “start-up” financing that collectively spawned Genentech and Biogen—and subsequent “genetic engineering” companies—is engagingly chronicled by Stephen S. Hal in the book Invisible Frontier-The Race to Synthesize a Human Genome through interviews with numerous individuals who were directly involved.

Through many advances in DNA Sanger sequencing methods and automated instrumentation during 1990-2003, base-by-base sequence determination of the entire human genome was realized by parallel public (government funded) and private (Celera Corporation) efforts.1 During the next decade, taking us to 2013, there has been stunning progress in developing various methods of massively parallel sequencing2 (aka “next-generation sequencing”) that, in part, has enabled further elucidation of epigenomics and RNA-mediated regulation as well as factor-mediated reprogramming of cells and pursuit of regenerative medicine. Such topics were discussed at a recently held Cold Spring Harbor Laboratory meeting entitled “From Base Pair to Body Plan – Celebrating 60 Years of DNA.” Below are just a few selected author names (in alphabetical order) and titles of presentations taken from the list of nearly 50 talks or posters found at the website for this event.3

Baylin, S. Celebrating the discoveries of the “hard drive” of DNA and its “software package,” the epigenome—Basic and translational implications
Young, R.A. Control of gene expression programs
Zaret, K.S. Programming and reprogramming cell fate

In closing this inaugural blog post, one can only wonder what new and exciting aspects of life sciences, diagnostics and medicine—including regenerative—will be realized during the next 10 years on the way to the 70th Anniversary of the Discovery of DNA’s Double Helix Structure. My view of future advances based on DNA fall into three interrelated areas:

  • “faster, better, cheaper” synthesis to create useful DNA-directed, self-assembling  “smart materials”
  • “bigger and deeper” experimentation aimed at complete molecular-level models for organisms and memory
  • “bio-factories” for Green Manufacturing

These thoughts about the next decade for DNA will be elaborated in my future posts.  What do you see happening during this time?



1. http://en.wikipedia.org/wiki/Human_Genom_Project

2. http://en.wikipedia.org/wiki/Massive_parallel_sequencing

3. http://meetings.cshl.edu/abstracts/dna602013_absstat.html