RNA and DNA Examples of ‘Seek, and Ye Shall Find’

  • New RNA Modification Found in the Epitranscriptomic Library for Mammals
  • Ditto for a DNA Modification in Mammalian Epigenetics
  • What Will ‘Ye’ Find Next with Improved RNA and DNA Analytical Methods?

‘Seek, and ye shall find’ is a biblical quote (Matthew 7:7) that many might say is a self-evident statement indicating that to find something you need to look for it. In any case, I used it as a segue for us scientists to remember that our collective understanding of RNA and DNA is limited, in part, by the analytical tools we use for these nucleic acid molecules. To me this predicament is akin to a person under the lamp post at night being limited to find whatever is illuminated, and miss what is not. The old cartoon below conveys a humorous variant of this. But I digress…

Taken from quoteinvestigator.com

Taken from quoteinvestigator.com

In previous postings here, I’ve commented on new illuminations—pun intended—for RNA modifications, such as pseudouridine, about which understanding of function is just emerging. And likewise for congeners of the 5-methyl group of cytidine in DNA, such as 5-hydroxymethyl or 5’-formyl moieties.

To continue this metaphor—but to get to the point—the lamp post’s light-field has been recently expanded to allow researchers to now find previously unseen modified bases in mammalian RNA and DNA, namely, N1-methyladenosine and N6-methyl-2’-deoxyadenosine, respectively. Briefly, here’s what’s been reported.

Expanding Epitranscriptomics

Just like modification of bases in DNA after its replication leads to what’s called epigenetics—which I’ll get to in the next section—modification of bases in RNA after its transcription leads to epitranscriptomics. This somewhat of a tongue-twister term was first introduced in 2013 by Sibbritt et al. in reference to increasing amounts of data for methylated bases in mRNA in the form of N6-methyladenosine (m6A) and 5-methylcytosine (m5C).

Both m6A and m5C have long been known to exist in both prokaryotic and eukaryotic organisms, but only more recently have a flurry of publications shed light—pun intended—on the precise locations, as well as enzymatic introduction and removal of m6A and m5C. This dynamic “writing” and “erasing” has been expertly reviewed by He and coworkers, who have also contributed to these discoveries.

m1A; taken from genengnews.com

m1A; taken from genengnews.com

He and collaborators at the University of Chicago, together with researchers in Israel, have recently reported in venerable Nature magazine a new mRNA modification, namely N1-methyladenosine (m1A), which occurs on thousands of different gene transcripts in eukaryotic cells spanning genetically simple yeast to much more complex mammals.

Taking advantage of newly developed sequencing approaches—i.e. brighter lamp posts in my metaphor—they showed that m1A is enriched around the start codon upstream of the first splice site. More specifically, they observed that m1A “preferentially decorates more structured regions around canonical and alternative translation initiation sites, is dynamic in response to physiological conditions, and correlates positively with protein production.”

They conclude by noting that “these unique features are highly conserved in mouse and human cells, strongly indicating a functional role for m1A in promoting translation of methylated mRNA.” So, as I said at the beginning of this blog, seek and ye shall find, and He did—capital H and pun intended.

Incidentally, those of you who are chemists will recognize that the ionic structure for m1A pictured above can undergo deprotonation to form a formally neutral counterpart having an H-N=C-N-CH3 moiety, as shown in TriLink’s catalog for the triphosphate derivative of m1A. Which one, or the proportion of these two structures for m1A exists in an mRNA of interest will depend on sequence context, local pH, etc. In this regard, scientists will need to somehow seek, in order to find, those answers in epitranscriptomics.

Expanding Epigenetics

m1A; taken from genengnews.com

m1A; taken from genengnews.com

Until recently, mammalian epigenetics had been solely focused on 5-methyl-2’-deoxycytidine and sequentially oxidized versions thereof, i.e. having hydroxyl, formyl, and carboxyl moieties, all of which are offered by TriLink as various products as either 5’-triphosphates or oligonucleotides. In less complex, prokaryotic organisms such as bacteria, N6-methyl-2’-deoxyadenosine (m6dA) is prevalent, but whether it is found in mammals has remained unclear until now.

A multi-university collaboration with the single-molecule-sequencing company Pacific Biosciences now report (Wu et al. 2016) the existence of m6dA in mouse stem cells. This landmark discovery was even more exciting because of the identification of an enzyme that removes methyl groups from m6dA, and by finding that m6dA is enriched in certain regulatory DNA sequences. Together these data provide clues to the possible function of m6dA in mammalian genomes.

Akin to my introductory light-from-the-lamp-post metaphor, detection and location of m6dA was enabled by more powerful analytical methods compared to those available in the past. Using state-of-the-art liquid chromatography-tandem mass spectrometry (LC-MS/MS), Wu et al. found that m6dA represented only 6–7 bases per million (!) adenines genome-wide. m6dA was enriched ~4-fold in genomic regions associated with a rare histone protein, H2AX, although the reasons for this association remain unclear.

The authors next determined specific locations of m6dA in sequences of DNA bound to H2AX by using Pacific Biosystems single-molecule real-time (SMRT) sequencing, which detects different kinetics with which a polymerase enzyme replicates modified bases compared with standard ones. Readers interested in this clever—dare I say “SMaRT”—method can check out my previous blog on SMRT.

In my opinion, the most challenging aspect of investigating epigenetics and epitranscriptomics is deciphering the dynamics and functional impact of “writing” and “erasing” methyl groups on bases in DNA and RNA, respectively. Erasure, i.e. removal of methyl groups is carried out by so-called demethylase enzymes. Several demethylases of the AlkB protein family have been shown to remove methyl groups from m6A in RNA as a means of regulating mRNA function. In mammals, this protein family has nine members, among which AlkBH1-deficiency was found by Wu et al. in mouse embryonic stem cells (ESCs) to lead to accumulation of m6dA in the genome, and that AlkBH1 could remove methyl groups from m6dA in DNA in vitro.

Intriguingly, Wu et al. further discovered that AlkBH1 worked most effectively on single-stranded DNA in vitro, thus raising the question of whether the enzyme preferentially operates during transcription or DNA replication in vivo, when DNA is transiently single stranded.

It’s also worth noting that enzymes that “write” (i.e. add) methyl groups to the N6-position of dA in mammalian DNA remain to be defined, as do the “reader” proteins that detect genomic m6dA.

In conclusion, I hope that this brief synopsis of new findings in “epi-marking” RNA and DNA has been illuminating—pun intended—with regard to molecular biology and the need for applying ever more powerful analytical methods to find more of what we are looking for: the molecular basis for living organisms.

On second thought, it seems to me that scientific discovery is perhaps more like moving forward from one lighted area to the next…

As usual, I welcome your comments.

Genes in Space

  • High School Student’s PCR Experiments Launched to Space Station
  • Program Evaluates Epigenetics Linked to Astronaut’s Altered Immunity
  • Amplyus Has Big Plans for Its Tiny, Low-Cost PCR Device

In my 2013 blog post on the 30th anniversary of the invention of Nobel Prize-winning PCR by Kary Mullis, I ventured to say that PCR of DNA or RNA was the most widely used—and enabling—method for all life sciences on planet Earth. This accolade can now be expanded to extraterrestrial space in view of PCR experiments to be carried out in the International Space Station (ISS) following the April 8th launch from Kennedy Space Center in Florida aboard NASA’s Cargo Resupply Services flight (CRS-8).

Taken from earthkam.org

Taken from earthkam.org

What makes this “out of this world” milestone for PCR even more exciting is that it’s the result of competition among students in high school—yes, high school—to conceive and design PCR-based studies relevant to living in space. Following is a brief synopsis of the program, the winning high school student, and a small startup company with big plans for its low-cost miniPCR™ device.
Continue reading

Three Fascinating Facts about Epigenetics

  • You Are What Your Father Eats
  • Exercise Affects Epigenetics
  • There’s an Epigenetic Clock

Before getting to some truly fascinating—in my opinion—facts about epigenetics, I thought it would be worth briefly explaining the essence of epigenetics to get us in sync. Many similar but differently worded definitions of epigenetics have been proposed over the years; however, I favor this one, which a group of experts hashed out at venerable Cold Spring Harbor Laboratory—and published, together with their detail reasoning:

“An epigenetic trait is a stably heritable phenotype resulting from changes in a chromosome without alterations in the DNA sequence.”

Continue reading

In Search of RNA Epigenetics: A Grand Challenge

  • Methylated riboA and riboC are the most commonly detected nucleobases in epigenetics research
  • Powerful new analytical methods are key tools for progress
  • Promising PacBio sequencing and novel “Pan Probes” reported   

In a Grand Challenge Commentary published in Nature Chemical Biology in 2010, Prof. Chuan He at the University of Chicago opined that “[p]ost-transcriptional RNA modifications can be dynamic and might have functions beyond fine-tuning the structure and function of RNA. Understanding these RNA modification pathways and their functions may allow researchers to identify new layers of gene regulation at the RNA level.”

Like other scientists who get hooked by certain Grand Challenges, I became fascinated by this possibility of yet “new layers” of genetic regulation involving RNA, either as conventional messenger RNA (mRNA) or more recently recognized long noncoding RNA (lncRNA). Part of my intellectual stimulation was related to the fact that some of my past postings have dealt with both lncRNA as well as recent advances in DNA epigenetics, so the notion of RNA epigenetics seemed to tie these together.

After doing my homework on recent publications related to possible RNA epigenetics, it became apparent that this posting could be logically divided into commentary on the following three major questions: what are prevalent epigenetic RNA modifications, what might these do, and where is the field going? Future directions were addressed by interviews with two leading investigators: Prof. Chuan He, who is mentioned above, and Prof. Tao, who has been involved in cutting edge methods development.

RNA Epigenetic Modifications

More than 100 types of RNA modifications are found throughout virtually all forms of life. These are most prevalent in ribosomal RNA (rRNA) and transfer RNA (tRNA), and are associated with fine tuning the structure and function of rRNA and tRNA. Comments here will instead focus on mRNA and lncRNA in mammals, wherein the most abundant—and far less understood—modifications are N6-methyladenosine (m6A) and 5-methylcytidine (m5C).


Three Approaches to Sequencing m6A-Modified RNA

Discovered in cancer cells in the 1970s, m6A is the most abundant modification in eukaryotic mRNA and lncRNA. It is found at 3-5 sites on average in mammalian mRNA, and up to 15 sites in some viral RNA. In addition to this relatively low density, specific loci in a given mRNA were a mixture of unmodified- and methylated-A residues, thus making it very difficult to detect, locate, and quantify m6A patterns. Fortunately, that has changed dramatically with the advent of various high-throughput “deep sequencing” technologies, as well as other advances.

(1.) Antibody-based m6A-seq 

An impressive breakthrough publication in Nature in 2012 by a group of investigators in Israel reported novel methodology called m6A-seq for determining the positions of m6A at a transcriptome-wide level. This approach, which is a variant of methylated DNA immunoprecipitation (MeDIP or mDIP), combines the high specificity of an anti-m6A antibody with Illumina’s massively parallel sequencing of randomly fragmented transcripts following immunoprecipitation. These researchers summarize their salient findings as follows.

“We identify over 12,000 m6A sites characterized by a typical consensus in the transcripts of more than 7,000 human genes. Sites preferentially appear in two distinct landmarks—around stop codons and within long internal exons—and are highly conserved between human and mouse. Although most sites are well preserved across normal and cancerous tissues and in response to various stimuli, a subset of stimulus-dependent, dynamically modulated sites is identified. Silencing the m6A methyltransferase significantly affects gene expression and alternative splicing patterns, resulting in modulation of the p53 (also known as TP53) signaling pathway and apoptosis. Our findings therefore suggest that RNA decoration by m6A has a fundamental role in regulation of gene expression.”

Moreover, their concluding sentence refers back to He’s aforementioned Grand Challenge Commentary about RNA epigenetics in 2010, just two years earlier.

“The m6A methylome opens new avenues for correlating the methylation layer with other processing levels. In many ways, this approach is a forerunner, providing a reference and paving the way for the uncovering of other RNA modifications, which together constitute a new realm of biological regulation, recently termed RNA epigenetics.”

(2.) Promising PacBio Single-Molecule Real-Time (SMRT) Sequencing of m6A

In a previous post, I praised PacBio (Pacific Biosciences) for persevering in development of its SMRT sequencing technology that uniquely enables, among other things, direct sequencing of various types of modified DNA bases via differentiating the kinetics of incorporating labeled nucleotides. Attempts to extend the SMRT approach to sequencing m6A have been recently reported by PacBio in collaboration with Prof. Pan (see below) and others in J. Nanobiotechnology in April 2013. Using model synthetic RNA templates and HIV reverse transcriptase (HIV-RT) they demonstrated adequate discrimination of m6A from A, however, “real’ RNA samples having complex ensembles of tertiary structures proved to be problematic. Alternative engineered RTs that are more processive and accommodative of labeled nucleotides were said to be under investigation in order to provide longer read lengths and appropriate incorporation kinetics.

The authors are optimistic in being able to solve these technical problems, and concluded their report by stating:

  “[w]e anticipate that the application of our method may enable the identification of the location of many modified bases in mRNA and provide detailed information about the nature and the dynamic RNA refolding in retroviral/retro-transposon reverse transcription and in 3’-5’ exosome degradation of mRNA.”

Let’s hope that this is achieved soon!

(3.) Nanopore Sequencing of m6A?

It’s too early to be sure, but continued incremental advances in possible approaches to nanopore sequencing suggest applicability to m6A. As pictured below, Bayley and coworkers describe a method that uses ionic current measurement to resolve ribonucleoside monophosphates or diphosphates (rNDPs) in α-hemolysin protein nanopores containing amino-cyclodextrin adapters.

Taken from Bayley and coworkers in Nano Lett. (2013)

Taken from Bayley and coworkers in Nano Lett. (2013)

The accuracy of base identification is further investigated through the use of a guanidino-modified adapter. On the basis of these findings, an exosequencing approach for single-stranded RNA (ssRNA) is envisioned in which a processive exoribonuclease (polynucleotide phosphorylase, PNPase) presents sequentially cleaved rNDPs to a nanopore. Although extension of this concept to include m6A has yet to be demonstrated, earlier feasibility studies by Ayub & Bayley have shown discrimination of m6A (and other modified bases) from unmodified ribobases.

Two Probe-Based Methods for Detecting Specific m6A Sites

1.) “Pan Probes”

As the saying goes, “what goes around comes around”, and in this instance its repurposing 2’-O-methyl (2’OMe) modified RNA/DNA/RNA oligos. This general class of chemically synthesized chimeric “gapmers” was originally used for RNase H-mediated cleavage of mRNA in antisense studies. Very recently, however, Pan and coworkers have cleverly adapted these probes—which I like to alliteratively refer to as “Pan Probes”—to m6A detection in mRNA and lncRNA.

For details see SCARLET workflow; taken from Pan and coworkers RNA (2013)

Pan Probes are comprised of “7-4-7 gapmers” having seven 2’OMe RNA nucleotides flanking four DNA nucleotides, the latter of which straddle known (or suspected) m6A sites, as depicted in the cartoon shown. The indicated series of steps, which involve site-specific cleavage and radioactive-labeling followed by ligation-assisted extraction and thin-layer chromatography, is thankfully called SCARLET by these investigators.

SCARLET was used by Pan and coworkers to determine the m6A status at several sites in two human lncRNAs and three human mRNAs, and found that the m6A fraction varied between 6% and 80% among these sites. However, they also found that many m6A candidate sites in these RNAs were not modified. Obviously, while much more work needs to be done to collect data for deciphering dynamic patterns and implications of m6A RNA epigenetic modifications, these investigators note that SCARLET is, in principle, applicable to m5C, pseudouridine, and other types of epigenetic RNA modifications.

Readers interested in designing and investigating their own Pan Probes can obtain these 7-4-7 gapmers by using TriLink’s OligoBuilder® and simply selecting “PO 2’OMe RNA” from the Primary Backbone dropdown menu, typing the first 7 bases in the Sequence box, selecting the 4 DNA bases from the Chimeric Bases menu and then typing the remaining 7 2’OMe RNA bases.

(2.) Probes for High-Resolution Melting

In a new approach very recently reported by Golovina et al. at Lomonosov Moscow State University, the presence of m6A in a specific position of mRNA or lncRNA molecule is detected using a variant of high-resolution melting (HRM) analysis applicable to, for example, single-nucleotide genotyping. The authors suggest that this method lends itself to screening many samples in a high-throughput assay following initial identification of loci by sequencing (see above). The method uses two labeled probes—one with 5’-FAM and another with 3’-BHQ1 (both available from Trilink’s OligoBuilder®)—that hybridize to a particular query position in a total RNA sample, as shown below for a 23S rRNA model system. The presence of m6A lowers the melting temperature (Tm), relative to A, with a magnitude that is sequence-context dependent.

Taken from Golovina et al. Nucleic Acids Res. (2013).

Taken from Golovina et al. Nucleic Acids Res. (2013).

The authors studied various probe-target constructs, and recommend 12–13-nt-long probes containing a quencher, and >20-nt long probes containing a fluorophore.  They also could advise that the quencher-containing oligonucleotide hybridizes to RNA such that m6A be directly opposite the 3′-terminal nucleotide carrying the quencher. The authors point out that relatively low-abundant, non-ribosomal targets need partial enrichment by, for example, simple molecular weight-based purification or commercially available kits. In this regard, they estimate that, if a particular type of mRNA was present at 10,000 copies per mammalian cell, 107 cells would be required to analyze m6A by this HRM method.

m5C Analysis by Sequencing of Bisulfite-Converted RNA

Selective reaction of bisulfite with C but not m5C in RNA, analogous to that long used for DNA, provides the basis for determining C-methylation status by sequencing. As detailed by Squires et al. in Nucleic Acids Res. in 2013, bisulfite-converted RNA can be sequenced by either of two methods: conversion to cDNA, cloning, and conventional sequencing, or conversion to a next-generation sequencing library. These authors described their salient findings as follows.

“We confirmed 21 of the 28 previously known m5C sites in human tRNAs and identified 234 novel tRNA candidate sites, mostly in anticipated structural positions. Surprisingly, we discovered 10,275 sites in mRNAs and other non-coding RNAs. We observed that distribution of modified cytosines between RNA types was not random; within mRNAs they were enriched in the untranslated regions and near Argonaute binding regions… Our data demonstrates the widespread presence of modified cytosines throughout coding and non-coding sequences in a transcriptome, suggesting a broader role of this modification in the post-transcriptional control of cellular RNA function.”

“Writing, Reading, and Erasing” RNA Epigenetic Modifications

Enzyme-mediated post-transcriptional RNA methylation (aka “writing”) and demethylation (aka “erasing”) are critical processes to identify and fully characterize in order to elucidate RNA epigenetics, and are formally analogous to those operative for DNA epigenetics.

RNA epigenetic “writing” mechanisms have focused on N6-adenosine-methyltransferase 70 kDa subunit, an enzyme that in humans is encoded by the METTL3 gene, and is involved in the posttranscriptional methylation of internal adenosine residues in eukaryotic mRNAs to form m6A. According to Squires et al., two m5C methyltransferases in humans, NSUN2 and TRDMT1, are known to modify specific tRNAs and have roles in the control of cell growth and differentiation.

As for “erasing”, in 2011, He’s lab discovered the first RNA demethylase, abbreviated FTO, for fat mass and obesity-associated protein, which has efficient oxidative demethylation activity targeting m6A in RNA in vitro. They also showed for the first time that this erasure of m6A could significantly affect gene expression regulation. In 2013, He’s lab discovered the second mammalian demethylase for m6A, ALKBH5, which affects mRNA export and RNA metabolism, as well as the assembly of mRNA processing factors, suggesting that reversible m6A modification has fundamental and broad functions in mammalian cells.

So, if Mother Nature evolved these mechanisms for writing and erasing RNA epigenetic modifications, what about the equally important, in between process of “reading” them? He and Pan and collaborators have very recently reported insights to such reading. They showed that m6A is selectively recognized by the human YTH domain family 2 (YTHDF2) “reader” protein to regulate mRNA degradation. They identified over 3,000 cellular RNA targets of YTHDF2, most of which are mRNAs, but also include non-coding RNAs, with a conserved core motif of G(m6A)C. They further establish the role of YTHDF2 in RNA metabolism, showing that binding of YTHDF2 results in the localization of bound mRNA from the translatable pool to mRNA decay sites. The carboxy-terminal domain of YTHDF2 selectively binds to m6A-containing mRNA, whereas the amino-terminal domain is responsible for the localization of the YTHDF2–mRNA complex to cellular RNA decay sites. These findings, they say, indicate that the dynamic m6A modification is recognized by selectively binding proteins to affect the translation status and lifetime of mRNA.

Expert Opinions of the Future for RNA Epigenetics

As I’ve said here before, there is no crystal ball for accurately predicting the future in science, although scientists do enjoy imagining that there is. Opinions of two “hands on” experts in the emerging field of RNA epigenetics are certainly of interest in this regard. Below are some comments offered by the aforementioned Prof. Tao Pan and Prof. Chuan He provided via an email interview in which I posed the question, ‘What do you see as the most important developments for RNA epigenetics?’ These experts have  thrown down the gauntlet, so to speak, by asserting RNA epigenetics as a Grand Challenge.

Prof. Tao Pan

Prof. Tao Pan

“In my opinion, the biggest current challenge for the field is to develop methods that can perturb m6A modification at specific sites to assess m6A function directly in specific genes. RNA interference or overexpression of an mRNA may simply decrease or increase modified and unmodified RNA alike. In a few cases, mutation of a known m6A site in an mRNA resulted in additional modification at a nearby consensus site, so that one cannot simply assume that mutation of a known site would not lead to cryptic sites nearby that may perform the same function. Further, functional understanding of a specific site should also take into account that all currently known m6A sites in mRNA and viral RNA are incompletely modified, so that one may need to explain why cells simultaneously maintain two RNA species that differ only at the site of m6A modification.”   

Prof. Chuan He

Prof. Chuan He

The m6A modification is much more abundant than other RNA modifications in mammalian and plant nuclear RNA and is currently the only known reversible RNA modification. The m6A maps of various organisms/cell types need to be obtained. High-resolution methods to obtain transcriptome-wide, base-resolution maps are important. A future focus should be to connect the reversible m6A methylation with functions, in particular, the studies of the reader proteins that specifically recognize m6A and exert biological regulation. The first example of the YTHDF2 work just published in Nature (above) is a good example. We believe many other reader proteins exist and impact almost all aspects of mRNA metabolisms or functions of lncRNA. 

Besides m6A, there are m5C, pseudoU, 2′-OMe, and potentially other modifications in mRNA and various non-coding RNAs (such as the recently discovered hm6A and f6A). The methods to map these modifications (except m5C) need to be developed and their biological functions need to be elucidated. 

Lastly, potential reversal of rRNA and tRNA modifications needs to be studied. As I stated in the Commentary in 2010, dynamic RNA modifications could impact gene expression regulation resembling well-known dynamic DNA and histone modifications. I think now we have enough convincing data to indicate this is indeed the case. The future is bright.”

Very bright, indeed! Your comments about this posting are welcomed.

Epigenetics 2.0 – Beyond DNA Methylation

Cytosine’s Chemical Biology Gets “Curiouser and Curiouser!” 

After following the White Rabbit down a large rabbit-hole, Lewis Carroll’s Alice found that curious happenings in Wonderland became “curiouser and curiouser!” Reading about cytosine’s curious chemical biology in epigenetics gave me the same impression, and left me a little dizzy wondering about what will be discovered next, and what it all means.

Before plunging into the cytosine-hole, so to speak, a bit of introductory information is offered here for readers unfamiliar with epigenetics, while those who know about epigenetics can skip to the next section.

Epigenetics Basics

Epigenetics is the study of changes in gene expression or cellular phenotype caused by mechanisms other than changes in the underlying DNA sequence, some of which have been shown to be heritable —hence the prefix epi- (Greek: επί- over, above, outer) used with the root-word genetics (the branch of biology that deals with heredity and genetic variations).

Epigenetics refers to functionally relevant modifications to the genome that do not involve a change in the nucleotide sequence. As depicted below, such modifications originally included DNA methylation and histone modification, both of which serve to regulate gene expression without altering the underlying DNA sequence.


Taken from Australian Epigenetic Alliance via Bing Images.

More specifically, DNA methylation refers to 5-methylcytosine (5mC), which is the initial focus of this blog post. In contrast, there are numerous types of histone modifications, as detailed elsewhere, and these will not be discussed further. However, keep in mind that there are linkage patterns and paradigms between DNA methylation and histone modification, as reviewed in Nature.

The existence of 5mC as a minor base in mammalian DNA was first reported in 1948 in a publication in J. Biol. Chem. by Hotchkiss at The Rockefeller Institute for Medical Research in New York, who separated the nucleic acids of calf thymus DNA using paper chromatography. Ironically, Hotchkiss called this minor base “epicytosine” because of its similarity to cytosine rather than any association with epigenetics; however, assigning the structure as 5mC did not occur until 1951 in work published in J. Amer. Chem. Soc. by Cohn at Oak Ridge National Laboratory. In a review entitled “Epigenetics: A Historical Overview,” Holliday points out that mechanistic models for DNA methylation-based epigenetics were first proposed in 1975 in independent publications by Riggs and by Holliday & Pugh.

Creating and “Erasing” 5mC

In mammals, 5mC occurs within CpG dinucleotides—mostly in CpG-rich promoter regions of genes—and is required for allele-specific expression of imprinted genes, transcriptional repression of retrotransposons, and for X chromosome inactivation in females. DNA methylation patterns are established early in the zygote by the de novo DNA methyltransferases 3A and 3B (DNMT3A/3B), and they are conserved during cell division by maintenance DNA methyltransferase DNMT1. These enzymes transfer the methyl group from S-adenosylmethionine to the carbon-5 position of cytosine. In their recent review, Delatte & Fuks state that “[a] longstanding mystery in the epigenetic field surrounds the mechanisms allowing transitions from the methylated to the unmethylated state.” DNA demethylation can occur both passively and actively. “Passive” DNA demethylation refers to progressive dilution of 5mC by exclusion of DNMT1 from the replication fork during mitosis. “Active” DNA demthylation requires rapid, replication-independent enzymatic removal of a methyl group or, far more likely, an intact 5mC-containing moiety given the chemical stability of the C-C bond between carbon-5 and the methyl group. Delatte & Fuks note that “[s]uch 5mC ‘erasers’ have been intensely sought but have long remained elusive.”

That changed when Rao and coworkers suggested in Science in 2009 that TET (“ten eleven translocation”) enzymes and 5-hydroxymethylcytosine (5hmC) might be involved in a pathway leading to unmodified cytosine. They showed that TET1, a fusion partner of the MLL gene in acute myeloid leukemia, is a 2-oxoglutarate (2OG)- and Fe(II)-dependent enzyme that catalyzes conversion of 5mC to 5hmC in cultured cells and in vitro. In addition, hmC was shown to be present in the genome of mouse embryonic stem cells, and hmC levels decrease upon RNA interference-mediated depletion of TET1. Analogous activity was attributed to TET2 and TET3 in a publication in Nature in 2010 by Zhang and coworkers, who demonstrated that TET1 has an important role in mouse embryonic stem (ES) cell maintenance through maintaining the expression of Nanog in ES cells.

Cytosine’s chemical biology became—as Alice exclaimed—curiouser and curiouser when TET enzymes were reported in Science in 2011 by Zhang and coworkers to catalyze serial oxidation of 5mC beyond 5hmC to 5-formylcytosine (5fC) and 5-carboxycytosine (5caC) in mouse ES cells and mouse organs. These investigators concluded that this finding raised the possibility that DNA demethylation may occur through TET-catalyzed oxidation followed by either enzymatic decarboxylation—akin to what is known for thymine—or the base-excision DNA repair (BER) pathway. Evidence for the latter possibility was provided in the same issue of Science by He et al., who reported that 5mC and 5hmC were oxidized to 5caC by TET in vitro and in cultured cells. In addition, 5caC was specifically recognized and excised by thymine-DNA glycosylase (TDG). Depletion of TDG in mouse embyronic stem cells led to accumulation of 5caC to a readily detectable level. It was concluded that oxidation of 5mC by TET followed by TDG-mediated base-excision of 5caC constitutes a pathway for active DNA demethylation.

I think you’ll agree that these cytosine-related biochemical transformations, depicted below along with other newly proposed conversions, represent a wondrous process for “erasing!”

Figure taken from Nabel et al. in ACS Chem. Biol.

Figure taken from Nabel et al. in ACS Chem. Biol.

U Too!

As in Alice’s Wonderland, cytosine’s wondrously curious chemical biology gets even curiouser and curiouser by intersecting, so to speak, with uracil (U). Apart from the direct removal of 5fC and 5caC by TDG, it was independently proposed in 2011 by Guo et al. and Cortellino et al. that 5hmC in DNA can be deaminated, by AID (activation-induced deaminase)/APOBEC (apolipoprotein B mRNA-editing enzyme complex) families of cytidine deaminases, to yield 5-hydroxymethyluracil (5hmU). While 5hmC in DNA is a poor substrate for TDG, 5hmU, when paired with a guanine, can be readily excised by DNA glycosylases such as TDG. Thus, oxidation of 5mC to 5hmC by TET, deamination of the latter nucleobase by AID/APOBECs and TDG-induced BER of the resulting 5hmU may also give rise to active cytosine demethylation in mammals.

Figure taken from Liu et al. in Nucleic Acids Res.

Figure taken from Liu et al. in Nucleic Acids Res.

The involvement of this sequential oxidation-deamination mechanism in active cytosine demethylation was challenged in 2012 by Nabel et al. based on the apparent lack of significant biochemical activity of recombinant AID or APOBEC toward 5hmC deamination in vitro or in cultured cells because of failure in detecting 5hmU. Nevertheless, in 2013 Liu et al. cautioned that it remains possible that such deamination may occur in specific cellular context(s), and that more sensitive detection methods could prove useful. To this end, they developed powerful reversed-phase HPLC coupled with tandem mass spectrometry (LC-MS/MS/MS) methodology, along with the use of stable isotope-labeled standards, for much more sensitive and accurate measurements of deoxy 5hmC, 5fC, 5caC and 5hmU. They found that overexpression of the catalytic domain of human TET1 led to marked increases in the levels of deoxy 5hmC, 5fC and 5caC, but only a modest increase in 5hmU in genomic DNA of cultured human cells and multiple mammalian tissues.

At the risk of confusing you, it’s worth pointing out that 5hmU in DNA is called “Base J.” Interestingly, J is present in all kinetoplastid flagellates studied—including Trypanosoma and Leishmania—but absent from other eukaryotes, prokaryotes and viruses. J replaces ~0.5% of T in the nuclear DNA of kinetoplastida and is mainly present in the telomeric repeat sequence (GGGTTA)n. Synthesis of J-base containing DNA oligos by chemical methods allowed the identification of a 93kDa J-binding protein 1 (JBP1) in extracts of T. brucei, Leishmania species and Crithidia fasciculata. It is hypothesized that JBP1 catalyzes the first and rate-limiting step in J biosynthesis, the hydroxylation of T in DNA. For references and very interesting molecular-level details of conformational dynamics of binding of JBP1 to DNA with J (5hmU) or 5hmC, see recent work by Heidebrecht et al.

What’s next for C—the “wild card” base in DNA?

In a review of this rapidly evolving and curious molecular biology of cytosine, Nabel et al. offer the view that, “[t]aken together, this rich medley of alterations renders cytosine a genomic ‘wild card’, whose dependent functions make the base far more than a static letter in the code of life.” They also offer the following opinions on future directions.

First, there are pressing questions that need to be explored related to whether cytosine is endowed with a unique set of chemical properties that lead to its remarkable methylation, oxidation, and deamination biochemistry. Might there be other epigenetic DNA base modifications and derived biological functions not yet discovered? Given the advances in metabolomics, isotopic labeling, and sensitive instrumentation, perhaps new DNA modifications will be detected and tracked.

Second, several precedents suggest reevaluation of the scope of reactions catalyzed by known DNA cytosine-modifying enzymes. TET enzymes may catalyze other oxidations, and TDG might excise other modified nucleotides. DNMT enzymes are now known to catalyze the addition of aldehyde moieties, not just a methyl group—might it do more?

Third, there should be more bioinformatics-guided searching for novel enzymes that modify DNA, such as the carboxylase for 5caC, and also more traditional biochemical approaches using DNA containing modified nucleobases.

Finally, and perhaps most importantly, there is a need for novel chemical biology tools to detect site-specific modifications, akin to what has been done already for DNA methylation patterns using, for example, bisulfite sequencing. This critical need has already been addressed by Korlach and coworkers in the case of 5hmC by strand-specific, base-resolution detection of 5hmC in genomic DNA with single-molecule sensitivity, combining a bioorthogonal, selective chemical labeling method of 5hmC with single-molecule, real-time (SMRT) DNA sequencing.

Oh, let’s not forget about RNA!  

Chuan He’s article in Nature Chem. Biol. in 2010 is entitled Grand Challenge Commentary: RNA epigenetics? Therein he discusses examples of RNA modification and demodification that may impact biological regulation. These include RNA base methylation and dioxygenases that use iron, α-ketoglutarate and dioxygen to perform oxidation of modified RNA bases for demethylation or hypermodification. He posits that post-transcriptional RNA modifications can be dynamic and might have functions beyond fine-tuning the structure and function of RNA. Furthermore, understanding these RNA modification pathways and their functions may allow researchers to identify new layers of gene regulation at the RNA level.

I certainly agree. Do you? As always, your comments are welcomed.

Three Takeaways from the 3rd Next-Generation Sequencing Conference

  • Exciting potential of direct sequencing of modified DNA 
  • Small holes with big promise but bigger challenges 
  • Paleogenomics:  sequencing ancient DNA—how old can you go? 

Sometimes small scientific meetings have big impacts on one’s impressions, which was certainly my experience at the 3rd Next-Generation Sequencing (NGS) conference in San Francisco on June 19-21, 2013. Of the many interesting presentations (click here for all speakers and abstracts), three completely different topics struck me the most: Pacific Biosystems’ uniquely powerful single-molecule real-time (SMRT) sequencing of modified DNA, Sequencing-pioneer Prof. David Deamer’s update on Nanopore’s advances and challenges, and the new field of Paleogenomics involving sequencing old DNA. With apologies to all of the other speakers, and admitting personally biased selection, here are my comments about these three topics.

Pacific Biosystems: direct sequencing of modified DNA


Dr. Jonas Korlach co-invented SMRT technology with Stephen Turner, Ph.D., PacBio Founder and Chief Technology Officer, when the two were graduate students at Cornell University. Dr. Korlach joined PacBio as the company’s eighth employee in 2004. Dr. Korlach was appointed Chief Scientific Officer at PacBio in July, 2012.

Pacific Biosystems (PacBio) deserves a lot of credit for being able to overcome numerous technical challenges facing commercialization of its SMRT sequencing system, which offers some uniquely powerful capabilities. (I’ll save a bit of time and space by refraining from describing how this complex system works, but I encourage you to take advantage of various videos and other technical information available at PacBio’s website.) In addition to providing amazingly long read lengths (up to 20kb) to facilitate genome assembly, SMRT sequencing gives data related to kinetics of nucleotide incorporation. Algorithms for differentiating rate of incorporation of A, G, C or T opposite a cognate nucleotide position in the template strand for various sequence contexts within the “footprint” of a DNA polymerase can also differentiate modified template positions. In other words, the average rate of incorporation of G opposite C is different than that opposite 5-methylcytosine (5-mC). This difference in kinetics allows direct determination of epigenetic methylation patterns in DNA, which was the focus of an excellent presentation by PacBio CSO Jonas Korlach. Direct epigenetic sequencing of 5-mC is completely novel and offers a significant advantage by obviating the need to carrying out so-called ‘bisulfite conversion chemistry’ prior to sequencing. Commercial kits are available for bisulfite conversion but require extra time, can be very tricky, and utilize more sample than may be available—especially for limited amounts of clinical biopsies.

I subsequently checked PacBio’s website and found a white paper pdf stating that unique kinetic characteristics have been observed for over 25 types of base modifications, such as those shown below and for these reasons:

Molecular structures and abbreviations for modified bases directly identifiable by SMRT sequencing (taken from PacBio white paper).

Molecular structures and abbreviations for modified bases directly identifiable by SMRT sequencing (taken from PacBio white paper).

Especially exciting to me was Dr. Korlach’s brief mention at the end of his talk that SMRT could be used for direct sequencing of phosphorothioate (PS) linkages in DNA. While “man-made” PS modifications in synthetic DNA are well known, naturally occurring PS-DNA is a relatively recent—and quite surprising—discovery still being elucidated. A 2013 review (click here for pdf) of this novel and fascinating type of naturally occurring modified DNA states that “physiological PS modification is widespread in bacteria and occurs in diverse sequence contexts and frequencies [approximately 300 – 3,000 PS per 106 nucleotides] in different bacterial genomes, implying a significant impact on bacteria.” Bacterial PS-DNA has been shown to be introduced by a post-replicative biochemical pathway associated with a cluster of five genes, and is implicated in site-specific restriction and, more recently, chemical reducing capacity to protect bacteria against peroxide. PS linkages in DNA can have SP or RP stereochemistry at the phosphorus as shown below; however, all bacterial PS-DNA examined to date occurs in the RP form.

Generalized molecular structure of SP and RP PS-DNA linkages (taken from RS Phosphorothioates Wikipedia).

Generalized molecular structure of SP and RP PS-DNA linkages (taken from RS Phosphorothioates Wikipedia).

I later contacted Dr. Korlach to get more information about PS sequencing by SMRT and he referred me this video (~17 minutes) and conference abstract. In response to my question about whether SMRT sequencing could differentiate SP from RP stereochemistry, he replied that he and his collaborators have looked at this possibility but he couldn’t comment at this time because the work was ongoing and would be published in the future.

While awaiting publication of those findings, it’s interesting—I think—to speculate about other applications of SMRT direct sequencing of modified DNA. One intriguing possibility is determining the extent of, and genomic loci for, 5-fluoro-2′-deoxyuridine incorporation into DNA that heretofore has only been studied using indirect methods to decipher mechanisms of action of various 5-fluoropyrimidine anticancer agents.

What other possible applications of SMRT direct sequencing of modified DNA can you suggest?  (Please include in the comments section.)

Nanopore sequencing:  small holes with big promise but bigger challenges

When I presented a Church, Deamer, Branton et al. patent that broadly describes nanopore sequencing of DNA (see below) to my former marketing colleagues at Applied Biosystems Inc. (ABI) in 1998, they enthusiastically asked “how soon can we sell a nanopore sequencer?” After I told them the patent was prophetic and had no actual data, they disappointedly said “too bad, let us know when it’s ready.” Well, it’s now 15 years later, and many folks like me are still waiting for that commercialization date, despite hundreds of publications on many different variations of the basic concept.

Consequently, I attended Prof. David Deamer’s presentation with the hope of learning when some type of nanopore sequencer would finally be introduced by any one of several companies in this space—notably Oxford Nanopore Technologies (ONT), whose stellar Technology Advisory Board includes Prof. Deamer.


David W. Deamer is a Research Professor in the Department Chemistry & Biochemistry at UC Santa Cruz where his primary research area concerns the manner in which linear macromolecules traverse nanoscopic channels.

Prof. Deamer presented an excellent update starting with a stylized version of his original lab notebook sketch of the technology (see below). He then discusses some of the incremental progress—and many remaining challenges—for nanopore sequencing (check out reviews by Dunbar et al and others by searching “nanopore sequencing” on PubMed). He concluded with a description of recent results obtained by a group led by Prof. Mark Akeson, his long-time colleague and collaborator at UCSC. Among various innovations, a processive DNA polymerase is used to control translocation by ratcheting. Although the sequencing results presented were limited to only ~10 bases in a model oligonucleotide, a well-known and rather critical attendee—who I’ll keep anonymous—said during Q&A that “these were the most promising data I’ve seen so far.” That attendee then asked about ONT’s timeline for commercialization, to which Prof. Deamer said “he doesn’t speak for the company, but thinks that something might be introduced in another 6 months or so.”


Deptiction of nanopore sequencing method described in Church, Deamer, Branton et al. patent US 5,795,782.

At the risk of sounding like a pessimist, but based on my past experience where timelines for developing complex automated systems always took much longer than desired, I’d be very surprised if that “something” is launched by the end of 2013. Hopefully I am wrong so I’ll be on the look out just in case.

In the meantime, while we all await such an event, you can read about several thought-provoking nanopore sequencing-related topics:

Paleogenomics:  sequencing ancient DNA—how old can you go?

Relative to evolutionary time-spans, the study of paleogenetics is not old—going back to 1963 and Linus Pauling; however, very, very old (aka “ancient”) DNA is now “sequenceable” using modern NGS technologies. Just how old is “ancient” and what is the projected age-limit for sequenceable DNA were two questions I had in mind at the outset of the presentation by Prof. Eske Willerslev, who has been a pioneer in this field.


Prof. Eske Willerslev is a Danish evolutionary biologist at Copenhagen University and leader of the Ancient DNA and Evolution Group. He has received the Genius Award (Geniusprisen) of Danish Science journalists for his combination of groundbreaking research with an aggressive media strategy. Before becoming a scientist he lived for several years as a trapper in Siberia with his twin brother, anthropologist Rane Willerslev.

The presentation by Prof. Willerslev was rapidly delivered and jam-packed with snippets of results from numerous studies, which is another way of saying here that it was impossible for me to take notes from which to reconstruct a synopsis ex post facto using cited publications. On the other hand, I did get the following answers to my two probing questions.

Just how old is ‘ancient’ DNA?  

Prof. Willersley said that a draft genome from a ~700 thousand years before present (~700k yr BP) horse bone found at Thistle Creek, Canada represents the oldest full genome sequence determined so far, and by almost an order of magnitude. This stunning—to me—achievement, which was published in Nature online several days after the conference and has received considerable attention because of the significance of its findings with regard to “recalibrating Equus evolution.” As stated in the abstract of this publication, “[f]or comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski’s horse (E. f. przewalskii) [pictured below] and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0–4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus.”


Przewalski’s horse at Khustain Nuruu National Park in Mongolia. These horses are smallish and stocky in comparison to domesticated horses, with shorter legs that are often faintly striped, typical of primitive markings. The Przewalski’s horse has 66 chromosomes, compared to 64 in all other horse species. All Przewalski horses in the world are descended from 9 of the 31 horses in captivity in 1945. These 9 horses were mostly descended from ~15 captured around 1900. The total number of these horses by the early 1990s was over 1,500.

Interestingly, the aforementioned Nature publication reports using a combination of Illumina and Helicos sequencing, with the latter’s single-molecule sequencing capabilities providing an “advantageous complement” to the former’s data, as previously described. Since Helicos is now defunct, it will be interesting to see if such methodological complementarity can instead be provided by PacBio’s single-molecule sequencing.

How old can you go?  

As for “how old can you go” and still get sequenceable DNA, Prof. Willerslev said at the conference that “1 or 2 million years old should be possible.” A subsequent article by Millar & Lambert in Nature News & Views entitled Towards a million-year-old genome confirmed this and noted—as expected—that degradation of DNA into ever shorter fragments begins rapidly after death by action of the body’s own enzymes, and then by action of enzymes from microorganisms. The overall rate of decay of DNA is also influenced by environmental conditions are such as pH and, of course, temperature, as shown in the following graph, which was entitled ‘survival of the coldest.’



Plot of the rate of DNA decay vs. temperature for estimated half-lives of 30- and 100-base-pair (bp) DNA fragments. The estimated ages and temperatures of material used to recover the genomes of a Neanderthal (N), a woolly mammoth (M) and the horse fossil discovered at Thistle Creek, Canada (H) are shown [C. D. Millar & D. M. Lambert, Nature, Vol. 499, pp. 34-35 (2013)].

In closing this blog, I encourage you to get an appreciation for the impressive technical depth and scientific breadth of this conference by taking a look at the list of presentation titles and abstract, if you haven’t done that already.  NGS has truly revolutionized multiple and diverse fields of basic science and enabled a seemingly never-ending series of new and improved applications. Among these are studies of metagenomics and microbiomes, which will be the subject of this blog in the near future.

As always, your comments are welcomed.