Hachimoji DNA and RNA: A Genetic System with Eight Building Blocks

  • Researchers Seek to Expand A, G, C, and T Genetic Coding to Additional Nucleobase “Letters”
  • Steven A. Benner Led Expansion from Four to Six Letter-Coding in 2015
  • In 2019, Coding Uses Eight (“Hachi”) Letters (“Moji”)

According to the principles attributed to the early writings of Francis Crick in 1958, The Central Dogma of Molecular Biologystates that H-bonding between A/T and C/G base pairs underlies the storage of genetic information. This information is in “Watson-Crick” DNA for replication, transcribed into RNA, and finally decoded into protein. Exploring the expansion of such H-bonding (shown here) to include synthetic analogs of these four natural nucleobases has been of interest for theoretical and evolutionary reasons, and could have utility for many hybridization-based applications, as well as for storage of information.

Readers interested in an overview these subjects can consult a 2017 review in Acc. Chem. Res.by Richards and Georgiadis, titled Toward an Expanded Genome: Structural and Computational Characterization of an Artificially Expanded Genetic Information System(AEGIS). The pioneering work of Steven A. Benner on expanding the genetic code from four to six building-block letters is reviewed therein. This blog will highlight Benner’s 2019 Sciencepublication (Hoshika et al.) on AEGIS, which reports further expansion to eight building-block letters. This recently expanded system is appropriately named “hachimoji” DNA and RNA:  Hoshika et al. coined this term by combining the Japanese words for eight (“hachi”) and letters (“moji”).

Hachimoji DNA

At the outset, I should point out that there is a YouTube video lecture by Benner that is well worth watching to fully appreciate the rationale behind investigating AEGIS, and the experimental approaches explored.

Benner’s previously published work (Zhang et al.) on the evolution of a functional six nucleotide genetic code included the two new nucleotides, Z and P, which are shown below for DNA; dR is replaced with R in RNA. The H-bonding in these structures is between oppositely positioned donor (red) and acceptor (blue) atoms. These Z and P nucleotides were shown to undergo enzymatic copying, PCR amplification, and successive transcriptions, first into six-letter RNA, and then back into six-letter DNA.

Taken from Hoshika et al. Science363, 884-887. Copyright © 2019, American Association for the Advancement of Science, with permission.

Expansion from six letters to eight letters was investigated using two additional nucleotides, S and B, shown here. The nucleotides Z, P, S and B along with A, G, C and T were each incorporated into 94 different 8-mer sequences of hachimoji DNA oligonucleotides by use of otherwise conventional phosphoramidite chemistry for solid-phase chain-assembly. Duplexes of these GACTZPSB-containing hachimoji 8-mers were then used to measure melting temperature (Tm) values under a set of standard conditions. These experimental Tmvalues were then compared to predicted melting temperatures derived from state-of-the art thermodynamic parameterization of nearest-neighbor base-pair dimers, as described by Hoshika et al.

Plots of experimental versus predicted free-energy change (ΔG°37) (A) and experimental versus predicted melting temperature (Tm) (B) shown here indicate that, on average, Tmis predicted to within 2.1°C for the 94 GACTZPSB hachimoji duplexes, and ΔG°37 is predicted to within 0.39 kcal/mol. These errors were said to be similar to those observed with nearest-neighbor parameters for standard DNA:DNA duplexes, which was interpreted as meaning that “GACTZPSB hachimoji DNA reproduces, in expanded form, the molecular recognition behavior of standard 4-letter DNA. It is an informational system.”

Taken from Hoshika et al. Science363, 884-887. Copyright © 2019, American Association for the Advancement of Science, with permission.

High-resolution crystal structures were determined for three different hachimoji duplexes assembled from three self-complementary 16-mer sequences: 5’-CTTATPBTASZATAAG, 5’-CTTAPCBTASGZTAAG, and 5’-CTTATPPSBZZATAAG. These duplexes were crystallized with Moloney murine leukemia virus reverse transcriptase to give a “host-guest” complex with two protein molecules (host) bound to each end of a 16-mer duplex (guest). With interactions between the host and guest limited to the ends, the intervening 10 base pairs were free to adopt a sequence-dependent structure.

The hachimoji DNA in all three structures adopted a B-form with 10.2 to 10.4 base pairs per turn, similar to natural B-DNA shown here. The major and minor groove widths for hachimoji DNA were similar to one another and to the DNA duplex 5’-CTTATGGGCCCATAAG, but not to the DNA duplex 5’-CTTATAAATTTATAAG.

Despite these and other differences in structure (i.e. propeller and buckle angles), the structural parameters for the individual pairs and the dinucleotide steps of the hachimoji DNA were said to fall well within the ranges observed for natural 4-letter DNA, consistent with hachimoji DNA being a “mutable information storage system” like natural DNA, according to Hoshika et al. I should interject and state that these researchers use the term “mutable” with reference to Schrödinger, who theorized in 1943 that regularity in size was necessary for nucleobase pairs to fit into what he called an “aperiodic crystal,” which he proposed as necessary for reliable molecular information storage and faithful information transfer.

Hachimoji RNA

T7 RNA polymerase bound to DNA and RNA.

With the information storage and mutability properties shown for hachimoji DNA, Hoshika et al. then asked whether hachimoji information DNA could also be transmitted to give hachimoji RNA. To investigate whether native T7 RNA polymerase (pictured here) is capable of transcribing hachimoji DNA, they started with four model sequences that each contained a single nonstandard hachimoji component, B, P, S, or Z, each followed by a single cytidine. To analyze hachimoji RNA products, they labeled transcripts with [α-32P]cytidine 5´-triphosphate; digestion with ribonuclease T2 then generated the corresponding hachimoji 3′-phosphates. These were resolved in thin-layer chromatography (TLC) systems and compared with synthetic authentic nonstandard 3′-phosphates.

These experiments showed that native T7 RNA polymerase incorporates riboZTP opposite template dP, riboPTP opposite template dZ, and riboBTP opposite template dS. However, incorporation of riboSTP opposite template dB was not seen with native RNA polymerase. This observation was attributed to an absence of electron density in the minor groove from the aminopyridone heterocycle on riboSTP. Polymerases are believed to recognize such density, as it is presented by all other triphosphate substrates.

A, G, C, and U 2′-O-Methyl-Nucleotides.

Hoshika et al. therefore searched for T7 RNA polymerase variants able to transcribe a complete set of hachimoji nucleotides. One variant (Y639F H784A P266L, “FAL”) was especially effective at incorporating riboSTP opposite template dB. Interestingly, FAL was originally developed as a thermostable polymerase to accept 2′-O-methyl triphosphates, pictured right.

High-performance liquid chromatography (HPLC) analysis of its transcripts showed that 1.2 ± 0.4 riboSTP nucleotides were incorporated opposite a single template dB. FAL also incorporated the other nonstandard components of the hachimoji system into transcripts.


The findings reported by Hoshika et al. have been lauded by experts, and they represent a significant advance in synthetic biology with the availability of a further expanded, mutable genetic system built from eight different building blocks: four natural (stars) and four synthetic (circles). By continued investigations, additional synthetic building blocks will perhaps lead to further expansion of genetic coding. Intrigued by this possibility, I found that the Japanese word for ten is “juu,” so “juumoji” DNA and RNA might be next.

In any event, with currently increased information density over natural DNA and predictable duplex stability across all 8nsequences of lengthn, Hoshika et al. concluded that hachimoji DNA has potential applications in sequence-based bar-coding and combinatorial tagging, retrievable information storage, and self-assembling nanostructures. I have covered DNA-based information storage and self-assembling DNA nanostructures (aka origami), in some of my previous blogs.

Hoshika et al. also concludedthat structural differences among three different hachimoji duplexes are not larger than the differences between various standard DNA duplexes, making this system potentially able to support molecular evolution. Furthermore, the ability to have structural regularity independent of sequence shows the importance of inter-base H-bonding in such mutable informational systems. Thus, in addition to its technical applications, this work expands the scope of the structures that might be encountered in search for life in the cosmos, which Benner has written about here.

As usual, your comments are welcomed.

CasX Enzymes: A New Family of RNA-Guided Genome Editors

  • Search of Unusual Microbes Yields New CRISPR-Cas Systems
  • Tiny Life Forms Have Smallest Working CRISPR-Cas Systems
  • Novel CasX Structure and Mechanism Characterized by Cryo-Electron Microscopy

In 2012, a Science magazine publication by Doudna, Charpentier, and coworkers describedCas9, the CRISPR-associated (Cas) protein, as a programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. This work has already been cited ~6,500 times, alongside two other Cas9 studies listed in PubMed that year. They have been followed by a steadily increasing number of annual Cas9-related publications, as show in the chart below. A large part of this growing interest is due to the proven utility of CRISPR-Cas9, and variants thereof, for gene editing, which I have previously blogged about.

Given the broad scientific, clinical, and commercial utility of CRISPR-Cas systems, it is not surprising that there has been considerable effort directed toward either engineering analogs of known Cas enzymes, or discovering new homologs in unexplored organisms. With regards to the latter approach, Doudna, Banfield and coworkers noted in 2017 that the then available CRISPR-Cas technologies were based solely on systems from isolated, cultured bacteria, leaving the vast majority of enzymes from organisms that have not been cultured untapped.

They added that metagenomics—sequencing DNA extracted directly from natural microbial communities—provides access to the genetic material of a huge array of uncultured organisms. For this reason, through use of metagenomics, the researchers were able to discover two previously unknown CRISPR-Cas systems. These new Cas proteins, named CasX and CasY to designate as yet unknown specifics, are said to be among the most compact systems yet discovered. In February 2019, as a follow-up to these discoveries, Doudna and collaborators published the mechanistic details for CRISPR-CasX in Nature magazine. This will be the focus of this blog, but before that story, here are some introductory comments about metagenomics, a transformative technology in its own right.

Putting Together All of the Pieces

In a review by Chen and Pachter, metagenomics is described as “the application of modern genomics techniques to the study of communities of microbial organisms directly in their natural environments, bypassing the need for isolation and lab cultivation of individual species.” They add that “metagenomics has revolutionized microbiology by shifting focus away from clonal isolates towards the estimated 99% of microbial species that cannot currently be cultivated.”

A typical metagenomics project begins with the construction of a DNA library derived from a minimally processed environmental sample that is usually comprised of multiple different genomes with different copy numbers. The increasing capacity of factory-like sequencing centers has facilitated whole-genome shotgun sequencing and genome assembly of these complex mixtures. At the risk of oversimplification, this to me is conceptually akin to simultaneously putting together correctly all of the pieces of multiple different jigsaw puzzles.

There are many technical variations for these sequencing and bioinformatic procedures, but at a high-level, these can be categorized as either using extracted DNA per se (metagenomics) or cDNA derived from reverse transcription of extracted RNA (metatranscriptomics). Both of these approaches were used in the aforementioned discovery of CasX and CasY, starting with quite unusual sample sources: (1) acid-mine drainage samples (from the Richmond Mine at Iron Mountain in California); (2) river water and sediment samples (from a site along the Colorado River in Colorado); and (3) cold, CO2-driven geyser water (from Crystal Geyser on the Colorado Plateau in Utah pictured here). Presumably, these relatively unusual sample sources increased the discovery-probability, as scientists were able to examine the previously unknown organisms present in each sample.

Discovery of CasX and CasY

Using metagenomics, Doudna, Banfield and coworkers found a number of CRISPR-Cas systems, including what they believed to be the first Cas9 in the in the archaeal domain of life. Archaea constitute a domain of single-celled microorganisms. These microbes are prokaryotes, meaning they have no cell nucleus. Archaeal cells have unique properties that separate them from the other two domains of life, bacteria and eukarya, as depicted here. Archaea are further divided into multiple recognized phyla, but classification is difficult, as most have not been isolated in the laboratory.

This divergent Cas9 protein was found in little studied nanoarchaea, as part of an active CRISPR-Cas system. Incidentally, nanoarchaea are “nano” indeed, only ~400 nm in diameter—about 5% of the volume of your archetypical 1 μm3prokaryote, according to one estimate—andNanoarchaeum equitansharbors a genome that is only 480 kb. Also discovered were two previously unknown Cas proteins unlike all the previous Cas proteins. These were named CasX and CasY, since it was not clear what they actually did. CasX and CasY are among the most compact systems yet discovered, according to these researchers, who concluded that “interrogation of environmental microbial communities combined with in vivo experiments allows us to access an unprecedented diversity of genomes, the content of which will expand the repertoire of microbe-based biotechnologies.”

Cryo-Electron Microscopy (cryo-EM) Characterization of CasX

In February 2019, a follow-up report in Natureby Doudna and collaborators focused on the mechanistic details for CRISPR-CasX. Although RNA-guided DNA binding and cutting proteins have proven to be transformative tools for genome editing across a wide range of cell types and organisms, only two kinds of CRISPR-Cas nucleases—Cas9 (depicted here) and Cas12a (aka Cpf1)—provide the foundation for this revolutionary technology.

The only conserved part of CasX, the RuvC domain, shares less than 16% identity with RuvC domains in either Cas9 or Cas12a. This evolutionary ambiguity in CasX hinted that this enzyme may have a structure and molecular mechanism distinct from that of other CRISPR-Cas enzymes. These structural and mechanistic questions were investigated by use of cryo-EM, a specialized method recently catapulted into widespread view by the co-awarding of the 2017 Nobel Prize in Chemistry to its three pioneers.

As discussed in an introductory YouTube video on cryo-EM, scientists traditionally used X-ray crystallography to obtain biomolecular structures, which requires growing suitable crystals that are oftentimes extremely difficult or not possible to obtain. However, as seen here, freezing a thin layer of a solution of the sample for cryo-EM enables the technique to handle structures for which crystallography is not a viable option. In addition, cryo-EM can visualize much larger structures than crystallography can—100-fold larger according to one cryo-EM expert. By way of example, a 1.8-Å-resolution structure of 334-kDa glutamate dehydrogenase, and 3.6-Å-resolution structure for 11,200-kDa Dengue virus have been reported.

Scientist preparing samples for cryo-EM under liquid nitrogen temperature

Doudna and collaborators took advantage of cryo-EM to obtain eight molecular structures of CasX in different states, which interested readers can view by consulting the 2019 Nature publication (unfortunately, copyright restrictions prevent reproduction here). The researchers’ verbal description of what was found highlights the following structural elements:

“An unanticipated quaternary structure in which the RNA scaffold dominates the architecture and organization of the enzyme. Phylogenetic, biochemical and structural data show that CasX contains domains distinct from—but analogous to—those found in Cas9 and Cas12a, as well as novel RNA and protein folds; thus establishing the CasX enzyme family as the third CRISPR-Cas platform that is effective for genetic manipulation. Finally, distinct conformational states observed for CasX suggest an ordered non-target- and target-strand cleavage mechanism that may explain how CRISPR–Cas enzymes with a single active site, such as Cas12a, achieve double-stranded DNA (dsDNA) cleavage. The small size of CasX (<1,000 amino acids), its DNA cleavage characteristics, and its derivation from non-pathogenic microorganisms offer important advantages over other CRISPR–Cas genome-editing enzymes.”


On the basis of their functional and structural data, Doudna and collaborators propose a model of CasX activation and DNA cleavage that includes the following steps: (1) guide RNA binding-induced CasX structural stabilization and DNA search; (2) non-target-strand binding-assisted DNA unwinding, R-loop formation and nontarget-strand loading into the RuvC active site; (3) RNA-DNA hybrid duplex bending with the aid of the proposed target-strand loading (TSL) domain to position the target DNA strand for cleavage; and (4) product release after the cleavage of both DNA strands.

They added that two distinct target DNA-bound states indicate that CasX coordinates sequential dsDNA cleavage by its single RuvC nuclease, using the zinc-finger-containing TSL domain. Also, the TSL domain appears to confer a convergent mechanism of acute target-strand DNA bending that is central to all type V single-nuclease CRISPR-Cas enzymes.

Looking forward, they speculated that “[t]he compact size, dominant RNA content and minimal trans-cleavage activity of CasX differentiate this enzyme family from Cas9 and Cas12a, and provide opportunities for therapeutic delivery and safety that may offer important advantages relative to existing genome-editing technologies.”

In my opinion, it will likely take some time and considerable experimentation by the scientific community to assess whether any of these potential advantages offered by CasX will actually pan out and lead to widespread adoption. In the meantime, mRNA-encoding Cas9 has firmly established its utility and enjoys extensive adoption, as exemplified by many diverse applications that I found among the search results for “TriLink AND Cas9” in Google Scholar.

As usual, your comments are welcomed.

A Comprehensive Landscape of Transcription Errors in Cells

  • Circular Sequencing (CirSeq) Finds Transcription Errors in RNA
  • CirSeq Found >100-Fold More Errors in RNA Compared to Replication Errors in DNA
  • CirSeq Can Elucidate How “Molecular Noise” Affects Cellular Function

Biological reactions are remarkably precise. For example, enzymatic proteins have the amazing ability to not only selectively bind to only the correct substrates from among complex mixtures of countless molecules, but also to do so at the right time and location. This precision is especially important in the context of DNA replication (DNA →DNA), transcription (DNA →RNA), and translation (RNA →protein), as depicted here. These fundamental processes involving nucleic acids are collectively referred to as The Central Dogma of Molecular Biology, which are principles attributed to the early writings of Francis Crick in 1958.

Together, these three processes preserve the integrity of our genome and ensure the faithful expression of our genetic code. However, all chemical and biochemical reactions are imperfect, which is to say that untoward reactions necessarily occur, even if only very infrequently. As a result, numerous studies have investigated the mechanisms that control the fidelity of DNA replication and translation; however, technical limitations have greatly handicapped efforts to investigate the fidelity of transcription. Unlike genetic mutations in DNA, transcription errors in RNA are transient, and are not stably inherited from cell to cell as is DNA. This transient nature of RNA errors makes them difficult to detect.

Conceptually, single-cell single-molecule sequencing of fragmented mRNA could be employed to analyze transcription errors; however, this approach faces multiple technical barriers. Chief among these is that the HeliScope, which was the first commercially available instrument reported for direct RNA sequencing (DRS), is no longer provided by Helicos Biosciences, as the company went out of business in 2012. In the DRS method, as shown elsewhere, poly‐adenylated and 3′blocked RNA is captured on surfaces containing covalently bound poly(dT) oligonucleotide with the 3′end facing “up.” Subsequent cycles of unblocking and extension with reversible terminators are, however, plagued by very short reads (~25 bases) and high error rates (3 – 5%), based on reported cDNA data.

Nanopore-based DRS is now possible by methods that can be read here. However, based on comments reported by Garalde et al., this nanopore approach also exhibits unacceptably high error rates.

An alternative way to measure the fidelity of transcription involves reverse-transcribing RNA into complementary DNA (cDNA), followed by conventional sequencing of cDNA. However, a crucial drawback of this strategy is that reverse transcriptase enzymes—mainly derived from viruses such as retroviridae depicted below—are expected to make one error every ~10,000 to 30,000 bases. Since RNA polymerases are expected to make one error every ~300,000 bases, a standard cDNA library will always be dominated by reverse transcription errors that mask the errors made by RNA polymerases.

One solution to this problem is to reverse-transcribe the same mRNA molecule multiple times. For example, if multiple cDNA copies were made of a single mRNA molecule, then a true transcription error would be present at the same location in every cDNA copy of this molecule, whereas a reverse transcriptase error would appear in only one of these copies. This is basic idea behind the “circle-sequencing” (CirSeq) assay reported by Acevedo and Andino in 2014. The assay’s name derives from the key step in the CirSeq protocol: circularization of mRNA.

As depicted here, transcription errors are identified by producing mRNA fragments, circularizing these with a ligase, and reverse transcribing the RNA circles into cDNAs for a DNA polymerase-mediated rolling-circle reaction. The resultant linear cDNA molecules are comprised of tandem repeats of the original RNA fragments. During this step, artifactual mutations may arise in the cDNA. The cDNA is then processed to generate a library, amplified, and sequenced. During this process, further artifacts may arise. However, because these artifacts are only present in one copy of the tandem repeats, they can be distinguished from true transcription errors, which are present in all tandem repeats.

Adapted from Kuznetsova et al.Nucleic Acids Res.45, 5487–5500 (2017). Open Access.

Polio virus

In 2014, Acevedo et al. reported use of CirSeq to characterize mutations in proteins derived from transcription errors in poliovirus, which is depicted here. These researchers stated that their study “provides the first single-nucleotide fitness landscape of an evolving RNA virus and establishes a general experimental platform for studying the genetic changes underlying the evolution of virus populations.” The importance of this 2014 publication in prestigious Natureis evidenced by more than 200 citations in Google Scholar over a period of less than five years.

In 2017, Gout et al. reported numerous modifications to the original CirSeq assay that streamlined the protocol, increased its sensitivity, and designed a customized bioinformatic pipeline to identify transcription errors. Methodological details for these improvements go far beyond the scope of this blog, so interested readers will need to consult the full report by Gout et al., as this blog will only address key findings.

Key Findings Using CircSeq

S. cerevisiae

Gout et al. screened >8.5 billion bases of the entire transcriptome of Saccharomyces cerevisiae (S. cerevisiae) and found >200,000 transcription errors in eight unique cell lines. Previous efforts have detected only ~100 transcription errors in eukaryotic cells, i.e. organisms consisting of a cell or cells in which the genetic material is DNA in the form of chromosomes contained within a distinct nucleus. Consequently, these results reported by Gout et al. represent the first comprehensive analysis of the fidelity of transcription in a eukaryotic organism.

Importantly, the errors detected by Gout et al. were distributed across the entire transcriptome of S. cerevisiae, indicating that the CircSeq approach provides a genome-wide view of transcriptional mutagenesis in yeast.

Errors were found along the entire length of transcripts, indicating that they affect every aspect of RNA functionality, including the location of the start and stop codon, the stability of secondary structures, and the information that is encoded in the primary sequence. Thus, transcription errors affect every aspect of protein structure and function, including residues for post-translational modifications, catalysis, substrate binding, and structural integrity.

Gout et al. found that, on average, the yeast transcriptome contains ~4.0 errors per million base pairs, which demonstrated that transcription errors occur >100-fold more frequently than DNA replication errors. However, these errors are not distributed equally over the transcriptome. Molecules of mRNA contain the least amount of errors (3.9 × 10−6per base pair), and are synthesized by RNA polymerase II (RNAPII), which is a 550-kDa complex of 12 subunits required for binding to upstream gene promoters to start transcription, as depicted here.

Space-filling model of RNAPII. The structure of yeast RNAPII was solved by Stanford University Prof. Roger Kornberg, who was awarded the Nobel Prize in Chemistry in 2006 for his studies of the process by which genetic information from DNA is copied to RNA.

In terms of increasing error rate, ribosomal RNA molecules synthesized by RNAPI (4.3 × 10−6per base pair), mitochondrial RNA (9.3 × 10−6per base pair), and RNA molecules associated with “housekeeping” genes synthesized by RNAPIII (1.7 × 10−5per base pair) closely follow RNAPII-derived RNA. Gout et al. said that these results suggest that each RNA polymerase has its own unique error rate, as has been observed for DNA polymerases.

On the other hand, within a class of transcripts, the error rate was remarkably constant. For example, the error rate of transcripts synthesized by RNAPII is independent of the expression level of a gene, its distance from an origin of replication, or the position of a base along the length of the gene. In addition, Gout et al. found that bases that are known to be subject to RNA modifications did not display an increased error rate, although they did detect a significant decrease in the coverage of these bases, indicating that they are not efficiently reverse-transcribed and are thus underrepresented.

Related Findings Using Single-Molecule Real-Time (SMRT) Sequencing

These latter observations on modified bases in RNA serve as my transition into discussing recent studies by Potapov et al., who employed Pacific Biosciences (PacBio) SMRT sequencing, which I have previously blogged about. Potapov et al. used SMRT to measure the fidelity of incorporation and replication of modified ribonucleotides such as N6-methyladenosine (m6A), 5-methylcytidine (m5C), 5-hydroxymethylcytidine (hm5C), pseudouridine (Ψ), and inosine (I).

Taken from Potapov et al.Nucleic Acids Res. 46, 5753-5763 (2018). Open Access.

As depicted here,T7 RNA polymerase was used to synthesize base-modified RNA from nucleotide triphosphate pools wherein A, G, C or U was replaced with a corresponding nucleotide analog selected from m6A, m5C, hm5C, Ψ, I, 5-methyluridine (m5U), or 5-hydroxymethyluridine (hm5U). I am pleased to say that all of these reagents were obtained as modified nucleotides from TriLink. After synthesis of modified RNA, first and second strand cDNA was synthesized by reverse transcription, and the resultant double-stranded DNA product was converted into a circular template for SMRT sequencing.

Interested readers should consult Potapov et al. for an explanation on how and what RNA polymerase and reverse transcriptase error rates were determined from bioinformatic analysis. Qualitatively, for the C5 position of uracil, a relatively small methyl group had minimal effect on RNA polymerase incorporation and reverse transcriptase replication fidelity. Increasing the size of the methyl group by adding a hydroxyl group increased first strand errors. Pseudouridine, which contains a secondary amine at the equivalent C5 position in uracil, did not affect reverse transcriptase fidelity, but instead produced substitution errors more frequently during RNA synthesis by T7 RNA polymerase. Misincorporation errors can have implications for pseudouridine-modified RNA-based therapeutics.


Potapov et al. concluded that, with the methodology they describe, it will be possible to define the transcriptional component of nongenetic mutations for the first time and to understand how this “molecular noise” affects cellular function. These investigators believe that their “experiments open up a new field of mutagenesis to widespread experimentation,” and add that one of the most challenging aspects of this field will be to define the impact of transcription errors on cellular health.

According to Potapov et al., the data suggests that transcription errors are particularly detrimental to cellular proteostasis, which is a portmanteau of the words protein and homeostasis, and refers to the concept that there are competing and integrated biological pathways within cells that control the biogenesis, folding, trafficking and degradation of proteins present within and outside the cell.

For example, according to these researchers, in patients that suffer from nonfamilial cases of Alzheimer’s disease, transcription errors can generate toxic versions of the amyloid precursor protein, whereas similar errors generate mutated versions of the ubiquitin-B protein. In both cases, these errors occur on tracts of GA dinucleotide repeats that are present in the coding regions of the affected genes. These observations suggest that transcription errors can directly contribute to human pathology if they occur repeatedly at the same location.

In addition to these highly specific transcription errors, Potapov et al. note that it has long been suspected that a much larger population of errors may exist. These errors have evaded detection because they occur randomly throughout the genome. The investigators believe that their experiments now confirm this suspicion and describe the “landscape of these errors” in detail. To me, this landscape of errors is visually akin to the digital picture of a landscape with erroneous pixels shown here.

Potapov et al. conclude the following:

“Because transcription errors are ubiquitous throughout the genome and can affect any gene at any location, we suspect that the molecular noise created by these errors could be substantial. An important challenge in the future will be to connect these errors directly to the changes in cellular function and monitor their effect on cellular health. We anticipate that these experiments will ultimately lead to the discovery of a wide range of unexpected phenomena, including new mutagens, new mutational mechanisms, and new disease processes that could help us understand how the environment and our lifestyle choices affect our overall health, as well as our predisposition to diseases that are caused by protein aggregation.”

I fully agree with this concluding perspective for the future, and look forward to learning much more about how this “molecular noise” affects cellular function.

As usual, I welcome your comments.

Tagging Modified mRNA with TAG (Transglycosylation at Guanosine)

  • Modified mRNA (Mod-RNA) for Therapeutics Drives Development of New Technologies
  • Among These Is Tagging Mod-RNA for Visualization In Vivo
  • Tagging Mod-RNA with TAG Earns Neal Devaraj the 2018 Blavatnik National Award in Chemistry

Back in 2013, I wrote a blog titled Modified mRNA Mania, which drew attention to the then nascent development of an entirely new field of therapeutics, based on the use of biosynthetic modified mRNA (mod-RNA). I also wrote a more recent blog about this still trending field, titled Deluge of mRNA Delivery Publications, featuring a chart of supporting publication data to emphasize the exponentially growing efforts to improve delivery.

Prof. Neal Devaraj. Credit Neal Devaral. With permission

The present blog in this ongoing series on mod-RNA focuses on in vivo visualization of mod-RNA by use of a very clever enzyme-mediated labeling method. Importantly, this labeling procedure can be applied to any biosynthetic mod-RNA of interest. The method was invented by Neal Devaraj, an Associate Professor of Chemistry and Biochemistry at the University of California, San Diego. Because of his work, Prof. Devaraj recently received the 2018 Blavatnik National Award in Chemistry, which you can read about at the end of this blog.

Backstory for Tagging Mod-RNA

In vivo expression of therapeutic proteins encoded in exogenous mRNA provides several advantages compared to delivery of the corresponding complementary DNA. Delivering mRNA is easier because it only needs to enter the cytoplasm, rather than the nucleus, to be functional (depicted here in scheme B). This avoids complications from transcription regulation machinery. Additionally, mRNA does not permanently alter the genome like genomically integrated DNA does (depicted here in scheme A), therefore avoiding permanent and potentially lethal changes.

Taken from Avci-Adali et al. J. Biol. Eng. 2014, 8, 8. Copyright © BioMed Central Ltd. With permission.

In order to overcome the intrinsically unstable and transient nature of natural RNA, the incorporation of modified nucleotides has been—and continues to be—explored to harness RNA as a therapeutic agent. Multiple studies have focused on providing an increase in serum stability, a less active immune response, and an increase in translational capacity, as reviewed elsewhere.

5-moUTP lithium salt. Taken from TriLink BioTechnologies

Many types of modified nucleobases can be incorporated into mRNA transcripts by substituting natural nucleotide triphosphates with one or more modified nucleotide triphosphates during enzymatic synthesis with an RNA polymerase. TriLink BioTechnologies R&D group has synthesized and screened numerous modified nucleotide triphosphates, and 5-methoxyuridine (5moU) has been shown to be especially useful. Interested readers can learn more about mod-RNA at this link to a video presentation in 2018 by Dr. Anton McCaffrey, Senior Director of Emerging Science and Innovation at TriLink.

In order to fully exploit the therapeutic potential of mod-RNAs, novel strategies for the safe and effective “decoration” or tagging of in vitro transcribed RNA, as well as functional moieties or fluorophores to allow for visualization, must be developed. A convenient bioconjugation method that is capable of appending targeting molecules or fluorophores to therapeutic Mod-RNAs would enable new modalities of highly specific uptake of RNA through endocytic pathways, as well as visualization.

Tagging Mod-RNA with TAG

To address the need for a specific and convenient bioconjugation method for mod-RNA, Devaraj and his students envisioned decorating mod-RNA by use of an RNA modifying enzyme. In their 2018 publication in Molecular Pharmaceutics, it was noted that more than 100 RNA post-transcriptional modifications have been reported to date, with a large majority of these modifications occurring on transfer RNAs (tRNAs). Among these, bacterial tRNA guanine transglycosylases (TGTs) have been extensively studied. As depicted here, during the transglycosylation reaction, the N−C glycosidic linkage of a key guanine at the wobble position of the anticodon loop is cleaved, and a 7-deazaguanine derivative named preQ1 (R = H) is substituted.

Taken from Devaraj and coworkers J. Am. Chem. Soc. 2015, 137, 40, 12756-12759. Copyright © American Chemical Society. With permission

Importantly, the RNA-bound TGT crystal structure previously reported by others suggested to Devaraj that there might be enough room to chemically modify the natural preQ1 (R = H) substrate and thus repurpose the enzyme to covalently append a larger R group, such as a fluorophore or affinity tag. This concept was initially reduced to practice in a preliminary communication, wherein the minimal 17-nucleotide hairpin sequence shown above, which was named TAG, was genetically engineered into the 3′-UTR of a full mRNA transcript coding for the red fluorescent protein mCherry (mCherry-TAG). When treated with bacterial TGT and a preQ1 derivative with R = dye, fluorescence-based gel analysis confirmed the formation of the expected mCherry-TAG-dye conjugate.

In a recent Molecular Pharmaceutics publication, Devaraj and his students extended this E. coli TGT activity to recognize TAG in mCherry mod-RNA comprised of the chemically modified nucleobases 5-methylcytosine (5mC) or pseudouracil (Ψ), as well as doubly substituted 5mC + Ψ mod-RNA. The approach depicted here is a two-step process involving RNA-TAG covalent conjugation followed by tetrazine biorthogonal labeling.

Taken from Devaraj and coworkers Mol. Pharmaceutics 2018, 15, 3, 737-742. Copyright © American Chemical Society. With permission

In vitro transcription (IVT) reactions for mCherry-TAG mod-mRNA transcripts were performed with both a partial (25%) and a complete (100%) replacement of the natural bases with 5mC and Ψ by use of the corresponding modified triphosphates, which I’m pleased to say were obtained from TriLink. A cap analog was added to the 5’ end and, after IVT, the 3’ end was polyadenylated to furnish mature mRNAs capable of being translated into mCherry-TAG upon transfection into mammalian cells.

Although mRNA-TAG has been established to efficiently and directly incorporate small molecule moieties, Devaraj and coworkers state that they envisioned a two-step labeling approach of mod-mRNA such that the degree of labeling could be more concretely proven. Additionally, this methodology would also allow a diverse array of targeting molecules, irrespective of size or class, to be easily conjugated and evaluated without the need for multiple syntheses of preQ1 derivatives or the need to quantify each substrate’s efficiency of incorporation.

As depicted here, appending a bioorthogonal tetrazine moiety to mod-mRNA transcripts (red) can be followed by coupling a trans-cyclooctene (TCO)-functionalized fluorescent probe or affinity agent (blue) using previously established, robust tetrazine-ligation chemistry, which is reviewed elsewhere.

Taken from Chaudhuri et al. Bioconjugate Chem. 2017, 28, 4, 918-922. Copyright © American Chemical Society. With permission

The mCherry mod-mRNAs were treated with the TGT enzyme at 37 °C for 4 h to first access tetrazine-labeled mCherry mod-mRNAs. The purified tetrazine-conjugated mod-RNA was subsequently conjugated to a TCO-functionalized sulfo-Cy5 fluorescent probe via tetrazine ligation by incubating for 4 h at 37 °C. The purified RNAs were subjected to polyacrylamide gel electrophoresis and imaged. As shown here, the degree of labeling (DOL) was determined to be ~95% for unmodified transcript, ~84% for 5mC transcript, ~66% for Ψ modified transcript, and ~61% for both Ψ and 5mC modified transcript.

Taken from Devaraj and coworkers Mol. Pharmaceutics 2018, 15, 3, 737-742. Copyright © American Chemical Society. With permission

Devaraj and coworkers hypothesize that the slightly lowered efficiency in labeling of the all 5mC-mod-mRNAs in comparison to unmodified mRNA could be due to variations in the secondary structure of the RNA-TAG hairpin, which may hamper enzyme substrate recognition. More interestingly, and with reference to the TAG hairpin shown above, they further speculate that the decrease in DOL for the Ψ containing mod-mRNA transcripts most likely arises from disruption of key interactions between the TGT enzyme and flanking uridine residues on either side of the exchanged guanine when substituted for the modified base Ψ.


Devaraj and coworkers concluded that they have developed a relatively simple methodology to covalently label mod-mRNAs with a modular two-step approach that can incorporate small molecules such as imaging agents, affinity handles, targeting agents, and drug conjugates using RNA-TAG. They also concluded that this approach provides the possibility of diverse decoration of therapeutically relevant mod-RNAs, without the limitation of length characteristic of synthetic RNA strategies. Finally, they expressed belief that “the RNA-TAG technology could greatly expand the arsenal of therapeutic RNAs by way of conjugation to a variety of functional decorations relevant to further tuning the emerging modality of RNA as a therapeutic class.”

I fully agree with these conclusions, and welcome your comments, as usual.


This past June, The Blavatnik Family Foundation and the New York Academy of Sciences announced the 2018 Laureates of the Blavatnik National Awards for Young Scientists, who will each receive $250,000: the largest unrestricted scientific prize offered to America’s most promising faculty-level scientific researchers 42 years of age and younger. Nominated by 146 research institutions across 42 states, the 286 nominees were narrowed to a pool of 31 Blavatnik National Finalists. From this pool of Finalists, a distinguished scientific jury chose three outstanding Laureates, one in each of the Awards’ scientific disciplinary categories: Life Sciences, Physical Sciences & Engineering, and Chemistry.

Life Sciences: Janelle Ayres, PhD, of the Salk Institute for Biological Studies, for her pioneering research in immunology and the study of how bacteria interact with humans. Dr. Ayres’s work is revolutionizing our understanding of host-pathogen interactions and has the potential to solve one of the greatest current public health threats: anti-microbial resistance.

Physical Sciences & Engineering: Sergei V. Kalinin, PhD, of Oak Ridge National Laboratory, for creating novel techniques to study, measure, and control the functionality of nanomaterials at the atomic and nanoscale levels. Dr. Kalinin’s work manipulating individual atoms has the potential to enable scientists to create new classes of materials by assembling matter, atom-by-atom.

Chemistry: Neal K. Devaraj, PhD, of the University of California, San Diego, for his transformative work on the synthesis of artificial cells and membranes, thus creating an exciting new field of research that aims to address one of the great challenges in synthetic biology. Dr. Devaraj has made several game-changing discoveries, and has pioneered the development of new methods for labeling biological molecules, which have already been adopted by researchers globally.

Here is a link to all of Neal Devaraj’s publications currently indexed in PubMed.

3’-End Labeling of RNA or DNA by a Polymerase Ribozyme

  • Revisiting a Polymerase Ribozyme for 3’-End Labeling Oligos
  • Using a Wide Variety of Modified Nucleotide Triphosphates from TriLink to Demonstrate Versatility of Labeling

Gerald Joyce. Taken from scripps.edu

In September 2016, I wrote a blog featuring a remarkable publication by the Gerald Joyce lab at Scripps Research Institute in La Jolla, CA. The researchers wrote about the in vitro evolution of an RNA catalyst (i.e. ribozyme) that had RNA polymerase activity and could amplify RNA. This purely RNA-based synthetic chemistry, in the complete absence of any proteins, provided further evidence for the feasibility of “RNA world,” a phenomenon first discussed by Walter Gilbert in 1986, who hypothesized the existence of prebiotic era billions of years ago during which life began without DNA or proteins.

This blog post once again spotlights the Joyce lab, but in the context of applying this novel polymerase ribozyme as a means to carry out 3’-end labeling of RNA or DNA with 50 modified nucleotides. I’m pleased to add that many of the requisite modified nucleotide triphosphates were obtained from TriLink! Interested readers can consult this June 2018 publication in Nucleic Acids Research for more details if they wish to supplement the brief overview that will be given here.


RNA polymerase ribozymes are in vitro evolved RNA molecules that extend an RNA primer on a complementary RNA template using NTP substrates. Currently, the most advanced RNA polymerase ribozyme is the ‘24-3’ polymerase, which was reported in 2016 by Horning & Joyce to have an extension rate of ~1 nucleotide (nt) per minute, and can operate on most template sequences. Using specially designed templates, the 24-3 polymerase can generally be limited to the addition of only a single modified nucleotide, thus enabling efficient 3’-end labeling of a target RNA or DNA using various NTP and dNTP analogs shown here in red.

Taken from Joyce and coworkers Nucleic Acids Research 2018

The highly structured 24-3 polymerase ribozyme, which is depicted here in 2D, contains 180 nt. The ribozyme also has a short “tag” sequence (5’ GUCAUUG 3’) at the 5’ end of the polymerase that is complementary to a sequence (3’ CAGUAAC 5’) at the 5’ end of the template. Besides this feature, the template sequence is not constrained. The primer, which corresponds to the template nucleic acid, binds to the template through Watson–Crick pairing and is extended by the polymerase to achieve 3’-end labeling.

Although the 24-3 polymerase ribozyme can add multiple successive NTPs to the 3’ end of a template-bound primer, the reaction can mostly be restricted to the addition of a single residue by choosing an appropriate template and providing only one of the four nucleobase substrates. By way of example and as shown here, four templates were constructed, each with a different templating nucleotide (red) at the first position of primer extension, followed by several non-complementary nucleotides. Together, this set of templates enables the testing of triphosphate analogs containing each of the four nucleobases.

Taken from Joyce and coworkers Nucleic Acids Research 2018

Exemplary Results

Although a great variety of functionalized nucleotides can be prepared by chemical synthesis, this study by Joyce and coworkers focuses on commercially available nucleotide triphosphate analogs, such as sugar, nucleobase, and backbone modifications, in order to demonstrate the general utility of the approach. Fifty different analogs were tested in a reaction employing a 0.8 μM RNA (or DNA) primer with the following sequence: 5’ UUGCUACUACACGAC 3’ (or corresponding DNA sequence), together with 1 μM ribozyme and 1 μM RNA template. The reactions were carried out in the presence of 200 mM MgCl2 and 0.5 mM NTP analog at pH 8.3 at 17C for 1 h. Yields by PAGE ranged from 11% to 89% overall, with 84% to 89% yield for five of the analogs.

The exemplary results tabulated here highlight the versatility of 24-3 polymerase ribozyme toward incorporation of NTP analogs with very diverse molecular structures that provide different types of functionality.

Exemplary NTP Analogs (TriLink) and Incorporation Yield by the 24-3 Polymerase Ribozyme

NTP analog Yield % NTP analog Yield %
N6-methyl-2-amino-ATP 49 2’-amino-dATP 85
7-propargylamino-dGTP 89 2’-amino-dGTP 70
biotin-16-aminoallyl-dUTP 28 2’-amino-dCTP 80
Pseudo-UTP 66 2’-amino-dUTP 50
α-thio-ATP 70 5-aminoallyl-CTP 67
α-thio-GTP 86 Cy5-aminoallyl-CTP 47
α-thio-CTP 84 5-formyl-CTP 85
α-thio-UTP 11 5-formyl-UTP 58
thieno-GTP 50 1-borano-dGTP 37
thieno-UTP 25 1-borano-dCTP 12

For instance, N6-methyl-2-amino-ATP is a member of the diaminopurines that are discussed elsewhere, while pseudo-UTP shown below is an isomer of UTP that is now widely used in modified mRNAs.

Pseudo-UTP. Taken from TriLink BioTechnologies

In a TriLink white paper by Paul and Yee titled PCR incorporation of modified dNTPs: the substrate properties of biotinylated dNTPs, it is noted that the high affinity of streptavidin for the biotin ligand is one of the strongest and most widely utilized interactions in biology. The strength and specificity of this interaction has been exploited in many biological applications, including secondary label introduction and affinity isolation. While there are various length linkers that have been employed for attachment of biotin to the nucleotide, the relatively long biotin-16-aminoallyl-2′-dUTP used for incorporation by 24-3 polymerase ribozyme is often preferred.

Modification of the 3’-end by incorporation of 7-propargylamino-dGTP, 2’-amino dNTPs or 5-aminoallyl-CTP provides a reactive primary amine group as a versatile “chemical handle” to attach virtually any type of moiety that is needed for an application, whether that be a detectable label or synthetic peptide. The α-thio-NTPs (aka 1-thio-NTPs) and 1-borano-dNTPs demonstrate that these phosphate modifications are compatible with the 23-3 polymerase ribozyme.

α-Thio-ATP (1-Thio-ATP). Taken from Trilink BioTechnologies // 1-Borano-dCTP. Taken from TriLink BioTechnologies

Thieno-UTP. Taken from TriLink BioTechnologies

5-Formyl-nucleotides provide a reactive formyl (i.e. -CHO) group for conjugation reactions with, for example, hydroxylamine-functionalized labels of the type reported elsewhere. The incorporation of thieno-NTPs is interesting because of the inherent fluorescent properties of this relatively new class of analogs offered by TriLink, which can be read about at this link.

Concluding Comments

According to the aforementioned Joyce publication, the simple, one-step installation of a fluorophore or affinity probe using the 24-3 polymerase ribozyme is likely to have broad application, offering an attractive alternative to 3’-end labeling using a polymerase protein such as poly(A) polymerase or terminal transferase. These polymerase proteins operate in the template-independent manner, and thus result in multiple successive additions, unless the NTP analog itself is a chain terminator.

As usual, your comments are welcomed.

Deluge of mRNA Delivery Publications

  • Strong Interest in mRNA Therapeutics Drives Increased Numbers of Delivery Publications
  • Novel Charge-Altering Releasable Transporters (CARTs) Undergo “Self-Immolation”
  • CARTs Outperform Widely Used Lipofectamine In Vitro and Enable In Vivo Delivery

Devotees of this blog may recall my past post in 2013 titled Modified mRNA Mania, which intentionally used the word “mania” to provoke reading about the trending topic on base-modified mRNA as therapeutic agents. My metrics for this mania were a flurry of scientific publications, patent applications staking out intellectual property, and massive investments by venture capitalists and established pharma companies in mRNA therapeutics startups.

As with antisense, siRNA, and antagomir RNA drugs, efficient delivery is widely recognized as a critical technical challenge to overcome. And, not surprisingly, past lipid-based approaches of various sorts are being reinvestigated for repurposing for mRNA delivery.

The focus of the present blog is a new strategy for mRNA delivery developed by a team of collaborators at Stanford University. Although I’ve chosen to highlight this report by McKinlay et al. in prestigious Proc. Natl. Acad. Sci., a search of PubMed for publications indexed to “mRNA delivery” in the title and/or abstract for the period 2005 to 2017 gave articles that can be perused at this link. The graph shown below supports my characterization of this level of activity as “deluge”-like in that there are more than 100 publications, mostly in the last few years, with 40 to 50 more during 2018, by my estimate.

Challenges for mRNA Delivery

Simply stated, the key challenge associated with the use of therapeutic mRNA is an inability to efficiently deliver functionally intact mRNA into cells. Like all nucleic acid-based drugs, mRNA is a macromolecular polyanion and thus it does not readily cross nonpolar cellular and tissue barriers. Moreover, it is also susceptible to rapid degradation by nucleases and ideally it should be protected during the delivery process, even though some success has been reported using intradermal injection of “naked” unmodified mRNA. Finally, after cell entry, rapid release of mRNA in the cytosol and appropriate association with the protein synthesis apparatus is required for translation.

Each of these is a potential point of failure for functional mRNA delivery. In addition to the challenges associated with complexation, protection, delivery, and release, an ideal delivery system would also need to be synthetically accessible, readily tuned for optimal efficacy, and safe.

Charge-Altering Releasable Transporters (CARTs)

McKinlay et al. have successfully addressed each of the challenges mentioned above by developing a highly effective mRNA delivery system comprising charge-altering releasable transporters (CARTs). Since a picture is worth a thousand words, I’ve reproduced here the diagram used by McKinlay et al. to describe their multistep approach with CARTs, namely complexation (1), intracellular delivery (2), and cytosolic release (3) of mRNA transcripts, resulting in induction of protein expression (4).

Taken from McKinlay et al. Proc. Natl. Acad. Sci (2017)

Readers interested in the clever chemistry that underlies CARTs should consult the publication by McKinlay et al. for details. In brief, these dynamic materials, specifically oligo(carbonate-b-α-amino ester)s (1) shown below function initially as polycations that noncovalently complex, protect, and deliver polyanionic mRNA and then subsequently lose their cationic charge through a controlled degradation to a neutral small molecule (2). The proposed mechanism for this degradation mechanism, which McKinlay et al. refer to as “self-immolative,” is pH-dependent.

Proposed rearrangement mechanism for n-mer oligo(α-amino ester)s 1 through tandem five-membered (5) then six-membered (6) transition states to afford an n-2-mer and diketopiperazine 2. Taken from McKinlay et al. Proc. Natl. Acad. Sci (2017)

As exemplified below, CARTs for cellular uptake were synthesized with hydrophobic blocks (n = 15) and cationic blocks (n = 12) such that 11b in physiological phosphate buffered saline (PBS) at pH 7.4 undergoes degradation to form 11c and small molecule 2.

Taken from McKinlay et al. Proc. Natl. Acad. Sci (2017)

These researchers hypothesize that this charge alteration reduces or eliminates the electrostatic anion-binding ability of the originally cationic material, thereby facilitating endosomal escape and enabling free mRNA release into the cytosol for translation. Readers interested in learning more about the complexities of endosomal escape can consult a (free, via Google) book chapter by Uyechi-O’Brien and Szoka titled Mechanisms for Cationic Lipids published in 2003, and a 2012 review by Nguyen and Szoka rhetorically titled Nucleic Acid Delivery: The Missing Pieces of the Puzzle?

Regardless of the actual mechanistic details for CARTs, McKinlay et al. demonstrate the efficacy of these materials to complex, deliver, and release mRNA in various lines of cultured cells including primary mesenchymal stem cells and in animal models, via both intramuscular (i.m.) injection and intravenous (i.v.) administration, resulting in robust gene expression. I’ll briefly outline these findings in what follows; however, the full paper and its supplemental material should be consulted for details.

Incidentally, I’m pleased to add that these CARTs were used to deliver the following base-modified [5-methylcytidine (5meC ) and pseudouridine (Ψ)] reporter mRNAs and dye-labeled mRNA obtained from TriLink BioTechnologies: Enhanced Green Fluorescent Protein (EGFP) mRNA, Firefly Luciferase (Fluc) mRNA, and Cyanine 5 (Cy5)-labeled EGFP mRNA.

Mechanism of Uptake and Release

Using a Cy5-labeled EGFP mRNA it was determined that the mechanism of cell entry for CART mRNA polyplexes is predominantly endocytic by comparing cellular uptake at 4 °C, a condition known to inhibit endocytotic processes, to normal uptake at 37 °C. Consistent with the expected endocytotic mechanism for ∼250-nm particles, HeLa cells displayed a significant (85%) reduction in Cy5 fluorescence at 4 °C.

Cellular uptake and mRNA translation following treatment with CART/mRNA polyplexes were then directly compared with polyplexes formed with non-immolative oligomers. By delivering a mixture of EGFP mRNA and Cy5-labeled EGFP mRNA, analysis of mRNA internalization and expression can be decoupled and simultaneously quantified: Cy5 fluorescence indicates internalized mRNA, irrespective of localization, and EGFP fluorescence denotes cytosolic release and subsequent expression of mRNA.

TriLink Cy5-labeled EGFP mRNA is transcribed with Cy5-UTP and an analog of UTP at a ratio which results in mRNA that is easily visualized and can still be translated in cell culture. Translation efficiency correlates inversely with Cyanine 5-UTP substitution.

This method was used in conjunction with confocal microscopy to compare cellular uptake and mRNA expression of two oligomers, namely, CART D13:A11 (7) and non-immolative, guanidinium-containing D13:G12 (13). Detection included dansylated transporter, Cy5-mRNA, and tetramethylrhodamine (TRITC)-Dextran4400, a stain for endosomal compartments. When cells were imaged 4 h after treatment with CART 7/Cy5-mRNA complexes diffuse fluorescence was observed for both the Cy5 and dansyl fluorophores, indicating that those materials successfully escaped the endosome and dissociated from the polyplexes (i).

Confocal microscopy of HeLa cells treated with Cy5-mRNA complexes using CART 7 or non-immolative oligomer 13 after 4 h. Cells were cotreated TRITC-Dextran4400. Scale bar, 10 μm. Taken from McKinlay et al. Proc. Natl. Acad. Sci (2017)

The two observed puncta in the dansyl signal (ii) was attributed to some intracellular aggregation of the dansyl-labeled lipidated oligocarbonate blocks, resulting from self-immolative degradation of the cationic segments of CART 7. Diffuse fluorescence from (TRITC)-Dextran4400 was also observed and attributed to endosomal rupture and release of the entrapped dextran.

However, when cells are treated with non-immolative 13/Cy5-mRNA complexes, both the Cy5 and dansyl fluorescence remain punctate and colocalized (iii). These signals also strongly overlap with punctate TRITC-Dextran4400, indicative of endosomal entrapment.

Taken together, according to McKinlay et al., these data strongly suggest that the charge altering behavior of CART 7 enables endosomal rupture and mRNA release, contributing to the high performance of these materials for mRNA delivery.

Applications and Animal Experiments

Oligo(carbonate-b-α-amino ester) D13:A11 7 was evaluated in applications to explore the versatility of CART-mediated mRNA delivery. EGFP mRNA expression following delivery by CART 7 was assayed in a panel of cell lines and compared to widely used Lipofectamine 2000 (Lipo). HeLa cells, murine macrophage (J774), human embryonic kidney (HEK-293), CHO, and human hepatocellular carcinoma (HepG2) cells all showed that the percentage of cells expressing EGFP using the CART 7 was >90%, whereas treatment with Lipo induced expression in only 22–55% of the cells. Importantly, in addition to these various immortalized cell lines, mRNA expression was also observed in primary CD1 mouse-derived mesenchymal stem cells (MSCs) with high transfection efficiency.

In vivo bioluminescence imaging (BLI) enables localization and quantification of expression following mRNA delivery in living animals. To assess the efficacy of CART/mRNA complexes following local (i.m.) or systemic (i.v.) routes of administration, CART 7-complexed Fluc mRNA (7.5 μg ) in PBS (75 μL) was given to anesthetized BALB/c mice in the right thigh muscle. As a direct control, naked mRNA was similarly injected in the opposite flank. D-luciferin was systemically administered i.p. at 15 min before imaging for each time point, and luciferase expression was evaluated over 48 h, starting at 1 h after the administration of mRNA complexes.

As shown here, when Fluc mRNA was delivered with polyplexes derived from 7 into the muscle, high levels of luciferase activity were observed at the site of injection. This expression peaked at 4 h and was still observable after 24 h but barely so after 48 h (see publication for percentages). In contrast, i.m. injection of naked mRNA afforded only low levels of luciferase expression, as measured by photon flux, in all five mice (see publication for percentages).

Representative BLI images following i.m. injection of naked mRNA (left flank) or CART/mRNA complexes (right flank).Taken from McKinlay et al. Proc. Natl. Acad. Sci (2017)

Following i.v. injections, the localization of mRNA polyplexes in tissues along the reticuloendothelial system pictured here provides many opportunities in inducing immunotherapeutic responses. According to McKinlay et al., spleen localization is “particularly exciting for future studies involving mRNA-based immunotherapy due to large numbers of dendritic and immune cells in that tissue.” Liver localization was also apparent in these animals, and expression in this tissue “may have applicability for treatment of hereditary monogenic hepatic diseases requiring protein augmentation or replacement such as hereditary tyrosinemia type I, Crigler–Najjar syndrome type 1, alpha-1-antityrpsin deficiency, Wilson disease, and hemophilia A and B, or acquired liver diseases such as viral hepatitis A–E and hepatocellular carcinoma.”

Overview of the reticuloendothelial system. ©Frazier et al. (1996)

Future Perspectives

Rather than paraphrase the future perspectives envisaged by McKinlay et al., here are those views, which to me seem warranted by the promising results summarized above:

“The effectiveness of mRNA delivery using these CARTs represents a strategy for mRNA delivery that results in functional protein expression in both cells and animals. The success of these materials will enable widespread exploration into their utilization for vaccination, protein replacement therapy, and genome editing, while augmenting our mechanistic understanding of the molecular requirements for mRNA delivery.”

As usual, your comments are welcomed.



Spotlight on TriLink Product Applications

  • Nearly 500 Publications in 2017 Cite Use of TriLink Products
  • Jerry Spotlights 20 Citing Oligos, Nucleotides, mRNA and Aptamers
  • 10 of These 20 Spotlighted Items Show Global Reach of TriLink Products

While thinking about possible topics to blog about, it occurred to me that researching recent publications on the applications of TriLink products would likely lead to many options. Using Google Scholar to do just that, I was given nearly 500 items, which is indeed plenty. However, choosing which to feature was neither an easy nor objective task. Having said that, and with sincere apologies to publications not spotlighted here, my “faves” and comments are given below, listed arbitrarily (not ranked) in four product categories: oligonucleotides, nucleotides, mRNA, and aptamers.

Taken from depositphotos.com

For convenience, each publication title can be clicked on to access the original article. Links to the cited TriLink products are also provided, alongside links to other adjunct information. Several trending “hot topics” and previous blogs are also noted.


Taken from researchgate.net


8-oxo-dGTP; taken from TriLink BioTechnologies // dPTP; taken from TriLink BioTechnologies


Modified mRNA for new therapeutic approaches continues to be an amazingly hot area of R&D, which I have previous dubbed “modified mRNA mania” in a previous blog. Interested readers can peruse this link to ~300 items found in my Google Scholar search for TriLink and mRNA publications in 2017.

pseudo-UTP; taken from TriLink Biotechnologies // 2-thio-UTP; taken from TriLink Biotechnologies


2’-F-dCTP; taken from TriLink BioTechnologies // 2’-F-dUTP; taken from TriLink BioTechnologies

Global Reach

A pleasantly surprising aspect of the selected-product search results given above is the worldwide distribution of researchers using TriLink products. This global reach, if you will, is evident from the following countries outside of the USA, which I made point of mentioning:

The Netherlands, India, Austria, Switzerland, Turkey, Germany, Italy, Belgium, Republic of Korea, and Denmark.

All of the publications listed above were selected solely on the type of TriLink product used. Given the relatively small “sample size” of these selected publications, which are only 20-of-500, finding investigators in 10 countries outside of the USA is a compelling testimonial for the TriLink global reach.

World Science Day

Truth be told, when I was searching for a fitting image to visually convey the concept of “global science,” I came across the fact that the United Nations Educational, Scientific, and Cultural Organization (UNESCO) has designated November 10 as World Science Day, with an emphasis on peace and development. The stated intention is to highlight “the important role of science in society and the need to engage the wider public in debates on emerging scientific issues. It also underlines the importance and relevance of science in our daily lives.”

Taken from monitor.co.ug

According to UNESCO, “[t]he theme for 2018 is ‘Science, a Human Right’, in celebration of the 70th anniversary of the Universal Declaration of Human Rights (art. 27), and of the Recommendation on Science and Scientific Researchers. Recalling that everyone has a right to participate in and benefit from science, it will serve to spark a global discussion on ways to improve access to science and to the benefits of science for sustainable development.”

To me, this is a long-term objective which is indeed critical for betterment of future generations.

As usual, your comments are welcomed.




Kool’s Cool Chemistry

  • Eric T. Kool Has a Long-String of Widely Cited Papers on Innovative Nucleic Acid Chemistry   
  • Kool’s Latest Contribution Involves Reversible Chemistry for PhotoCloaking RNA
  • Kool’s Application of This to Aptamers is Very Cool, In My Opinion

Eric T. Kool (taken from Stanford.edu)

Readers of my blog know that my writing style tends toward using alliterations, which is evident in this blog’s title referring to Prof. Eric T. Kool and his chemistry, and that cool is the slang sense, i.e. awesome, swell, nifty, etc.

At the outset, I should say that Stanford University-based Kool has a long string of very ingenious publications in diverse aspects of nucleic acids, which attract many readers evidenced by ~1,400 citations per year according to Google Scholar metrics. Kool has received numerous awards and honors that are listed at The Kool Lab website, together with his lab’s research interests.

An appreciation of the diversity of Kool’s scientific contributions can be gleaned by later perusing his ~270 publications at this PubMed link, but for now I’ll be focusing on his most recent publication titled RNA Control by Photoreversible Acylation. This article requires—unfortunately—a paid subscription to J. Amer. Chem. Soc. or purchasing access to a pdf, so I’ve tried my best to give you herein a synopsis of what Kool has achieved in this study, and why it’s so cool.

Caged Nucleic Acids

One of the ways to externally initiate (i.e. turn on) nucleic acid biochemical or biological function in vitro or in vivo is to use a “caging” strategy. The term caging generally refers to installation of one or more removable groups on a molecule in such manner so as to render the molecule inactive. For nucleic acids (and proteins, etc.) this has typically been achieved with photoremovable groups, such that irradiation with UV light removes the caging group and restores functional activity.

Taken from Deiters and coworkers ChemBioChem 2008

An illustrative example of this, published by Deiters and coworkers in 2008, is shown here for controlling gene silencing in mammalian cells with light-activated antisense agents. In this case, phosphorothioate-modified antisense agents were rendered inactive by installation of NPOM groups on nucleobases to prevent hybridization to a target mRNA. Brief irradiation with light at 365 nm removes the NPOM caging groups, enables sequence-specific binding to mRNA, and thus, blocks translation and/or leads to RNase H-catalyzed mRNA degradation.

A similar approach has been applied to functional studies of RNA. However, obtaining such photocaged DNA or RNA has usually required synthesis of chemically modified phosphoramidite monomer building blocks for solid-phase synthesis of the desired oligonucleotides. This limits utility to investigators who are able to deal with organic synthesis or have budgets to purchase relative costly photocaged monomers or custom synthesized photocaged oligonucleotides. Applying photocaging to long RNA is even more problematic.

Kool’s Cool Chemistry

To address these obstacles, Kool set out to develop a general method that can be applied post-synthetically to synthetic and native RNAs regardless of length, and moreover could be easily carried out by non-chemists. Based on studies (cited earlier) by others showing that 2’-OH groups of RNA can be selectively acylated in aqueous buffers with activated acyl compounds, Kool envisaged a strategy depicted here. An appropriately designed “PhotoCloaking” agent (PCA) is reacted with 2’-OH groups post-synthetically to block structure and interactions, but are rendered photo-responsive by including photocleavable bonds.

In his publication, Kool reasoned that “addition of several such blocking groups to the RNA (‘cloaking’) would in principle cover and protect it from folding and interacting with other molecules. Subsequently, the acylation could be reversed by exposing RNA to light, removing the blocking groups and switching on activity.” The PCA design strategy included combination of a reactive acyl group with an o-nitroveratryl photo-labile group, as shown here for PCAs 1 and 2 having unmethylated and methylated veratryl groups, respectively, to control the efficiency of photocleavage. Also included was a dimethylaminoethyl group to enhance solubility, and (after testing) 2-chloroimidazole as having the ideal level of reactivity.

A short 12-mer synthetic RNA oligo was used to evaluate 2’-OH acylation in water under mild conditions, in conjunction with denaturing polyacrylamide gel electrophoresis (PAGE) and mass spectrometry (MS), which indicated polyacylation with up to five groups (aka labels). Next, uncloaking of RNA by photoremoval of the PCA labels in water by irradiation with 365 nm light was confirmed by PAGE and MS. A 16-mer synthetic RNA oligo was similarly polyacylated and shown by native gel analysis and UV to have diminished ability to hybridize to complementary DNA. The reactivity of a synthetic hammerhead ribozyme previously reported was similarly disabled by PhotoCloaking and subsequently restored by exposure to UV light.

Kool then hypothesized that this postsynthetic polyacylation strategy would offer a convenient and simple method for preparing a photocontrolled aptamer of any length, which I have touted as a useful class of compounds in previous blogs. Kool’s test for this was especially interesting, in my opinion, because he used a 150-nt RNA transcript called Broccoli. This oddly named RNA is an aptamer that folds into a compact tertiary structure which binds DFHBI dye giving rise to a strong increase in fluorescence, and thus serves as an RNA mimic of well-known green fluorescent protein.

Kool had to use the more reactive PCA 2 to prepare a suitably cloaked version of Broccoli, which exhibited very dim fluorescence compared to uncloaked (i.e. untreated) Broccoli. When the cloaked aptamer was exposed to UV light and incubated with DFHBI, fluorescence was completely restored to the level of the untreated sample. The capability to use this PhotoCloaking strategy to switch on RNA function in cells was assessed by transfecting a 237-nt cloaked Broccoli-dimer RNA construct into a human (HeLa) cell line and then exposing these cells to light. As can be seen from the results shown here, the cloaking and uncloaking worked just as intended.

Epifluorescence microscopy of HeLa cells transfected with untreated or cloaked 237-nt Broccoli. Taken from Kool and coworkers J Am Chem Soc 2018

Kool concluded that “a great advantage of this postsynthetic labeling is that it allows for one-step photoprotection of long, biologically relevant RNAs that are difficult or impossible to synthesize via solid-state oligonucleotide synthesis. Since PhotoCloaking can be achieved without specialized equipment, the new method could potentially find widespread use for study of the biology of RNAs.”

Very cool indeed!

As usual, your comments are welcomed.

Transfer RNA (tRNA) Fragments Are Connected to Diseases

  • Specifically Formed tRNA Fragments (tRFs) can Repress Expression by RNAi 
  • Specific tRFs are Associated with Cancer and Other Diseases
  • Chemical Modifications in tRFs Pose a Challenge for Sequencing 

Researching new, trending topics for Zone in with Zon rewards me in several ways, including learning about important subject matter that I only vaguely knew about, or had been completely unaware of. The present blog is about tRNA fragments (tRFs), which was totally new subject matter for me that I found to be very interesting and worth sharing here.

But before getting to biological formation and functions of tRFs, I want to mention what led me to this intriguing class of RNA molecules. In a nutshell, TriLink’s R&D team decided to “brainstorm” on how its expertise in chemically modified RNA might be leveraged into new product offerings beyond its current lines of modified oligo RNA and modified messenger RNA (mRNA). Since tRNAs were long known to have numerous types of chemical modifications, as detailed elsewhere, TriLink’s R&D started to think about tRFs for reasons outlined below.

Biogenesis of tRFs

Formation of tRNA is a complex process. Initially, tRNA is transcribed in the form of a precursor (pre-tRNA) containing 50-nt leader and 30-nt trailer sequences, and in some cases introns in the anticodon loop. Pre-tRNAs then undergo various types of RNA processing steps to ultimately form mature tRNAs. During tRNA maturation, the 50-nt trailer is processed by RNase P, the 30-nt trailer is removed by RNase Z, and following 30-trailer removal, the 30-nt end of all human tRNAs is modified by enzymatic addition of the universal CCA triplet, as depicted here.

Pre-tRNA (left) and mature tRNA (right); adapted from Anderson & Ivanov FEBS Lett (2014)

Also depicted here are specific types of enzymatic cleavage reactions of mature tRNA by ribonucleases Dicer and angiogenin (ANG) that lead to formation of 5’-tRFs and 3’-CCA tRFs, as well as 5’-halves and 3’-halves. These tRFs derived from mature tRNAs, as well as tRFs from pre-tRNAs that will not be discussed here, have now been extensively characterized by high-throughput short RNA sequencing methods. Among new advances in this sequencing methodology, TriLink’s recent PLOS One publication of its innovative CleanTag™ sample prep procedure has already been viewed an impressive ~6,000 times since appearing online only ~14 months ago as of this writing.

Mature tRNA (adapted from Anderson & Ivanov)

It should be noted that tRFs are not restricted to humans but have been shown to exist in multiple organisms. Two online tools are available for those wishing to learn more about tRFs: the framework for the interactive exploration of mitochondrial and nuclear tRNA fragments (MINTbase) and the relational database of Transfer RNA related Fragments(tRFdb). MINTbase also provides a scheme for the naming of tRFs called tRF-license plates that is genome independent. A recent publication by Kim et al. is a good lead reference for various functions of tRFs, some of which include the following.

Possible Roles of tRFs in Human Diseases

In a review of this subject, Anderson & Ivanov emphasize that, while production of tRFs have been observed in several types of human diseases, it remains to be determined whether these tRFs contribute to disease pathogenesis. Landmark findings regarding functions of tRFs were published by a team, including Andrew Fire—2006 Nobel Laureate for  RNA interference (RNAi)—titled Human tRNA-derived small RNAs in the global regulation of RNA silencing that provided compelling evidence demonstrating that human tRFs can enter RNAi pathways. These findings by Fire & coworkers are now recognized as a previously unknown nexus of RNAi translational repression pathways involving tRFs and microRNAs (miRNAs) depicted here.

Schematic representation of the biogenesis of miRNAs and tRFs associated with Argonaute (AGO) proteins. Taken from Shigematsu & Kirino Gene Regul Syst Bio (2015)

tRFs and Cancer: In 2009, Lee et al. reported that a specific tRF, designated as tRF-1001, is highly expressed in a wide range of cancer cell lines but much less in tissues, and its expression in cell lines was tightly correlated with cell proliferation. Furthermore, siRNA-mediated knockdown of tRF-1001 impaired cell proliferation. Since that discovery, various research groups have similarly found specific tRFs associated with different types of cancer, as recently detailed by Croce & coworkers, who concluded the following:

“We found that tRNA-derived small RNAs (tsRNAs) [i.e. tRFs in this blog] are dysregulated in many cancers and that their expression is modulated during cancer development and staging. Indeed, activation of oncogenes and inactivation of tumor suppressors lead to a dysregulation of specific tRFs, and tRFs-knock out cells display a specific change in gene-expression profile. Thus, tRFs could be key effectors in cancer-related pathways. These results indicate active crosstalk between tRFs and oncogenes and suggest that tRFs could be useful [bio]markers for diagnosis or targets for therapy. Additionally, [overexpression of two specific tRFs] affect cell growth in lung cancer cell lines, further confirming the involvement of tRFs in cancer pathogenesis.”

Biomarkers in blood, which I’ve blogged about previously, are a “hot topic” in disease diagnostics because they offer a more general, less invasive and safer means of patient sample access compared to traditional tumor biopsies.

tRFs and Pathological Stress Injuries: Stress-related cellular damage is central to disease pathogenesis that can be induced by hypoxia, nutrient deprivation, oxidative conditions and metabolic imbalance. Dhahbi et al. sequenced short RNAs from mouse serum and identified abundant 5′-halves derived from a small subset of tRNAs, implying that these tRFs are produced by tRNA type-specific biogenesis. A survey of somatic tissues revealed that these tRFs are concentrated within blood cells and hematopoietic tissues, with very little in other tissues, suggesting that they may be produced by blood cells. Serum levels of specific subtypes of these 5′ tRNA halves change markedly with age, either up or down, and these changes were prevented by calorie restriction.

Taken from Mishima et al. J Am Soc Nephrol (2014)

In a study by Mishima et al., it was shown in vivo that oxidative stress leads to conformational changes in tRNA that thus allows ANG-mediated productin of tRFs. This stress-induced conformational change allows 1-methyladenosine nucleoside (m1A), a modification important for stabilizing the L-shaped structure of tRNA, to be recognized by an m1A-specific antibody, as depicted here. This antibody was used to show that renal injury and cisplatin-mediated nephrotoxicity (which both induce tissue damage via oxidative stress) generate tRFs. Similar results were obtained using m1A-based immunohistochemistry to directly visualize damaged areas of kidneys, brain and liver. Mishima et al. further demonstrated that these tRFS avoid degradation in the blood because they are associated with circulating exosomes, which are extracellular vesicles packed with proteins and nucleic acids.

tRFS and Neurodegenerative Diseases: As detailed in the above mentioned review by Anderson & Ivanov, ANG mutants possessing reduced ribonuclease activity were reported in 2006 to be implicated in the pathogenesis of Amyotrophic Lateral Sclerosis (ALS; aka Lou Gehrig disease), which is a fatal neurodegenerative disease that I have blogged about. In 2012, a subset of ALS-associated ANG mutants was also found in Parkinson’s Disease (PD) patients. Recombinant ANG is neuroprotective for cultured motor neurons, and administration of ANG to a standard mouse model for ALS significantly promotes both life-span and motor function.

Concluding Comments on Analysis of tRFs

Although I started this blog by refering to the fact that mature tRNAs are extensively modified by a wide variety of nucleobase and ribose chemical modifications, these modifcations were not further mentioned. That is because sample prep for short RNA sequencing uses reverse transcription to form cDNA that is then PCR amplified before sequencing, and it is widely acknowledged (e.g. Cozen et al.) that certain chemical modifications in RNA can interfere with reverse transcription. Thus, aside from reported use of demethylases to first remove interfering methyl groups from m1A, N1-methylguanosine, N3-methylcytosine, and N2,N2-dimethylguanosine, sequenced tRFs exclude many tRFs having chemical modifications that prevent reverse transcription.

Recognizing the need for alternative methods of determining structures of chemically modified tRFS, Limbach & Paulines have recently proposed the possibility of developing mass spectrometric (aka mass spec) approaches in a publication provocatively titled Going global: the new era of mapping modifications in RNA. I think this is a great idea, and hope that the mass spec community will soon address this challenge.

As usual, your comments are welcomed.


After writing this blog, Eng et al., who investigated the mosquito Aedes aegypti—the primary vector of human arboviral diseases caused by dengue, chikungunya and Zika viruses—reported the following:

Aedes aegypti mosquito. Taken from wcvb.com

“[A]single tRF derived from the precursor sequences of a tRNA-Gly was differentially expressed between males and females, developmental transitions and also upon blood feeding by females of two laboratory strains that vary in midgut susceptibility to dengue virus infection. The multifaceted functional implications of this specific tRF suggest that biogenesis of small regulatory molecules from a tRNA can have wide ranging effects on key aspects of Ae. aegypti vector biology.”

Click here to read my past blogs about Zika virus.

Long Noncoding RNA (lncRNA) Revisited

  • Publications Dealing with lncRNAs Show Exponential Growth
  • Evidence for Involvement of lncRNAs in Cancer is Increasing
  • Value of lncRNAs as Biomarkers Has Been Validated

Several years ago, I posted a blog about long noncoding RNAs (lncRNAs), which are defined as non-protein coding transcripts in the range of ~200 nt to ~100 kb long. Interest in lncRNA—and other types of noncoding RNA such as microRNA (miRNA) and short interfering RNA (siRNA)—is fueled in large part by a collective scientific desire to uncover and understand the existence and function of all forms of RNA dark matter, so named by analogy to dark energy in cosmology. The lncRNA component of RNA dark matter is certainly generated from transcription of noncoding (formerly “junk”) DNA, but much has yet to be elucidated about function.

As depicted below, lncRNA (red) may act as (a) decoys to release proteins from chromatin, (b) scaffolds for grouping protein complexes, (c) guides to recruit proteins or (d) transcriptional enhancers by bending chromatin. Not shown is lncRNA acting as an antagonist for other regulatory noncoding RNAs, namely miRNA, which can be studied by next-generation sequencing methods such as TriLink’s CleanTag approach.

Taken from Bohla et al. Dis Markers (2017)

Numerical support for upward trending interest in lncRNA is provided by the chart shown here for the number of annual publications on lncRNA. This chart was produced by using data I found in PubMed for 2000-2016, which clearly show a relatively flat rate of ~150 papers per year from 2000-2007, and then an exponential increase to ~2,220 papers in 2016.

Although it’s not possible to say for sure what catalyzed this marked upturn in lncRNA publications, searching the 2005 literature in Google Scholar led to finding the following top-5 cited publications, which from titles alone could be likely scientific co-catalysts:

In any case, when I did keyword searches of the ~13,000 publications on lncRNA, roughly one-third (~4,700) were related to cancer, and many (~1,800) dealt with biomarkers primarily for (~1,400) cancer, but also including cardiovascular diseases, diabetes, epilepsy, general anxiety disorder, inflammatory bowel diseases, etc. Given that cancer is a major medical problem for all countries to deal with, and knowing that early detection of cancer by finding better biomarkers is critically important, this blog revisits lncRNA in the context of cancer and biomarkers.

State-of-the-Art Technologies to Explore lncRNA

At the risk of over simplification, advances in RNA sequencing (RNA-Seq)—enabled largely by high-throughput instrumentation from Illumina, PacBio, and Thermo Scientific—has revolutionized the field of molecular biology by revealing that up to ~75% of the human genome is actively transcribed, and that most of this transcriptome consists of lncRNA. Bioinformatic analyses, which are way beyond my expertise, have played a key role in sorting out lncRNA from mRNA and other RNA species. Interested readers can consult Cobos et al. as a recent lead reference to learn more about these bioinformatic methods.

Following are links to a constellation of additional experimental techniques that are available for exploring lncRN, and can be perused in detail later:

Although these methods are employed to shed light on lncRNA cellular localization, structure, interaction networks and functions, interested readers should consult a review by Salehi et al. and research paper by Goyal et al. for discussions of the advantages and disadvantages of these techniques. For example, Goyal et al. note that many lncRNA are derived from bidirectional promoters, or overlap with promoters, or bodies of sense or antisense genes. In a genome-wide analysis, they found only 38% of 15,929 lncRNA loci are safely amenable to CRISPR applications, while almost two-thirds of lncRNA loci are at risk to inadvertently deregulate neighboring genes. For several representative lncRNAs, it was found that CRISPR—but not RNAi by siRNA or antisense oligos—also affects their respective neighboring genes.

In closing this section on methods, readers who follow my blogs know that I’m a big fan of nanopore sequencing, about which I’ve commented in several previous posts. Oxford Nanopore Technologies has recently announced advances in its nanopore technologies that now allow sequencing of an RNA strand directly, rather than analyzing the products of reverse transcription and PCR reactions. My scientific “crystal ball” sees use of nanopore sequencing of lncRNAs in the not too distant future.

Long Noncoding RNA in Cancer and as Biomarkers

According to Bohla et al., lncRNAs are now known to function as regulatory factors for numerous, important cellular processes, such as growth, differentiation, and cell death. In addition, lncRNAs are involved in controlling alternative splicing, regulation of gene expression at the posttranscriptional level, chromatin modification, inflammatory pathologies, and—when deregulated—various types of cancer. Lnc2Cancer is a manually curated, interactive database of cancer-associated lncRNAs with experimental support that provides a high-quality and integrated resource for exploring lncRNA deregulation in various human cancers. In my opinion, Lnc2Cancer is definitely worth perusing.

The figure shown here indicates various types of cancers (black) for which lncRNAs (names of which are in green, blue or red) have been implicated. Searching PubMed or Google Scholar using any of the lncRNA names will provide a host of publications to peruse.

Taken from Vitiello et al. Cellular Oncology (2015)

For example, searching PubMed for the term “HOTAIR” gives ~475 publications, in chronological order starting with the most recent. By contrast, searching Google Scholar for the terms “HOTAIR lncRNA” gives ~6,100 articles ranked by “relevance,” which is explained elsewhere as being heavily influenced by citation frequency. Readers interested in more details can consult Lin & Yang, who review the mechanisms by which lncRNAs regulate cellular responses to extracellular signals, and discuss the clinical potential of lncRNAs as diagnostic indicators, stratification markers and therapeutic targets of combinatorial treatments.

As was mentioned in the introduction, there are numerous publications aimed at using these and other cancer-associated lncRNAs as biomarkers. Among the main advantages of lncRNAs that make them suitable as cancer diagnostic and prognostic biomarkers, high stability while circulating in body fluids, especially when included in exosomes or apoptotic bodies, is noted by Bohla et al. Despite abundant quantities of ribonucleases in different body fluids, lncRNAs protected in exosomes or apoptotic bodies can be detected in whole blood, plasma, urine, saliva and gastric juice. These lncRNAs as biomarkers are obtainable by non- or minimally invasive methods, which are well tolerated by patients compared to conventional biopsies.

Taken from liquid-biopsy.gene-quantification.info

Challenges for use of lncRNAs as biomarkers include development of convenient, low cost yet robust isolation methods, and accurate quantitation of relatively low copy numbers, which heretofore has relied on an amplification step, such as enzymatic conversion into cDNA followed by PCR. However, single-molecule detection approaches have evolved to obviate the need for amplification.

For example, NanoString Technologies now offers the nCounter® lncRNA Assay for validation of lncRNA discoveries, which can then be followed by use for biomarker quantification. As depicted below in the left panel, single molecules of lncRNAs (red) can be detected at the same time as mRNAs green and blue), if so desired, using sequence-specific probes each having a fluorescence-based “barcode” identifier. The right panel depicts extension of this approach to identify lncRNA-protein interactions by inclusion of antibody precipitation. This digital-counting assay allows researchers to select up to 800 lncRNAs for analysis in a single multiplexed reaction, which is quite impressive, in my opinion.

Taken from nanostringxt.com

Taken from Cesano J Immunother Cancer (2015)

Readers interested in more details for application of nCounter® for analysis of biomarkers are referred to a recent publication by Permuth et al. dealing with pancreatic ductal adenocarcinoma (PDAC), which is an aggressive disease that lacks effective biomarkers for early detection. Briefly, these researchers hypothesized that circulating lncRNAs may act as diagnostic markers, and used nCounter® technology to measure the abundance of 28 candidate lncRNAs in pre-operative plasma from a cohort of pathologically-confirmed PDAC cases of various grades of severity and non-diseased controls. Results showed that two lncRNAs aided in differentiating PDAC from controls, and an 8-lncRNA signature had greater accuracy than standard clinical and radiologic features in distinguishing ‘aggressive/malignant’ PDAC that warrant surgical removal from ‘indolent/benign’ PDAC. In my opinion, these findings seem very promising for use with PDAC and, by conceptual extension, to other cancers.

Temozolomide (TMZ). Taken from sigmaaldrich.com

As a final example of lncRNA biomarkers for cancer, MALAT1 (pictured above) has been recently reported by Chen et al. as a prognostic factor in glioblastoma multiforme (GMF), and induces chemoresistance to temozolomide (TMZ). The significance of these findings is that GBM is the most malignant brain tumor with limited therapeutic options, and that TMZ is first-line chemotherapy for GBM. These researchers first used deep-sequencing and bioinformatic methods to identify lncRNAs showing different expression levels in TMZ-resistant and non-resistant patients. RT-qPCR was then performed in tissues and serum samples, and lncRNA MALAT1 was shown to discriminate between responding patients from non-responding patients.

Closing Comments

If you’re are interested knowing much more about lncRNAs, then lncRNABlog.com is a great website for you to visit and subscribe to, if you want to keep current on all manner of lncRNA research and industry news. This interactive blog, which posts abstracts and images from the latest lncRNA publications, allows readers to post comments that allow you to join in the “conversation” or simply follow what others are thinking about these articles.

Also provided are links to a host of different online tools for lncRNA research and development, as well as “what’s happening” in terms of upcoming lncRNA events or conferences. Those of you currently seeking a new position may find the jobs postings to be helpful.

Finally, there are there plenty of pop-up advertisements and commercial banners, but these are also informative about lncRNA products and services that are available.

As usual, your comments are welcomed.