RNA Epigenetics – Part 3

  • 2′-O-Methylation Sites in Human mRNA Sequenced with Base Precision
  • N3-Methylcytidine Discovered in Human mRNA
  • “The More You Know, the More You Know You Don’t Know”—Aristotle

A few years ago, I posted a blog titled In Search of RNA Epigenetics: A Grand Challenge, which commented on new discoveries of how chemically modified bases in mRNA were enzymatically added and removed, as mechanisms of cellular regulation. Since these processes occur post-transcriptionally, i.e. after mRNA is polymerized from DNA template, they are by definition epigenetic. The Greek prefix epi- (“over, outside of, around”) in epigenetics implies features that are “on top of” or “in addition to” the traditional genetic basis for inheritance.

That earlier blog (Part 1), featured N6-methyladenosine (m6A) and 5-methylcytidine (m5C), and quoted an expert’s comment that biological functions of pseudouridine (Ψ), 2’-O-methyl (2′-OMe), and potentially other modifications in mRNA and various non-coding RNAs need to be elucidated. I subsequently blogged about profiling Ψ in mRNA in Part 2. In this post, Part 3, I will share recent developments on 2’-OMe and introduce N3-methylcytidine (m3C) as the most recently discovered modification in mammalian mRNA.

Taken from Liu & He J Biol Chem (2017)

Mapping 2′-O-Methylation Sites in Human mRNA with Base Precision

This section heading is taken from the title of a July 2017 publication in Nature Methods by an international team that included Chuan He, who I featured in Part 1 of this series. These investigators noted that existing methods for locating 2’-OMe sites are underpowered for detection in relatively rare RNA molecules, such as mRNA, or at low abundance within a given RNA molecule.

To address these challenges, they developed a conceptually distinct approach based on the different chemical properties of nucleosides with 2′-OH and 2′-OMe, as well as combining enrichment with detection of a positive signal (rather than the lack of signal) to produce a suitably sensitive sequencing method. This method, named Nm-seq, leverages oxidative cleavage of ribose 2′,3′-vicinal diols by periodate to expose, enrich and map 2’-OMe sites in the transcriptome without bias and with single-nucleotide precision, as depicted below. Stepwise details for this biochemical methodology can be found below in the Addendum, so let’s skip to what was found using Nm-seq.

Taken from tud.ttu.ee

Stepwise details for this biochemical methodology can be found below in the Addendum, so let’s skip to what was found using Nm-seq:

  • Total levels of 2′-OMe in HeLa mRNA by LC-MS/MS and obtained 2′-OMe /2’-OH molar ratios ranged from 0.012% for A to 0.15% for U, with G and C between these values, and covered 7,412 sites, typically in less structured regions.
  • The majority of 2′-OMe sites (95.7%) occurred in 2,398 RefSeq annotated genes, 95.9% of which were protein coding, with 77% of these 2′-OMe-modified gene harboring only one 2′-OMe site.
  • Most of the sites (70.3%) occurred in coding sequences, and the rest occurred in 5′ and 3′ untranslated regions (3% and 10.6%, respectively) as well as in introns (16.2%), suggesting that 2’-OMe is installed co-transcriptionally in the nucleus.
  • While 2’-OMe was found in all three codon positions, the 1st position had more methylation than expected by chance, and the 3rd had less, which is consistent with a recent study that showed a codon-position-dependent effect of 2’-OMe on translation.
  • Nm-seq applied to mRNA purified from HEK293 cells recapitulated the features observed in HeLa mRNA.

In my opinion, the biggest mystery in these findings is why nature evolved to include only one 2’-OMe modification in most of the mRNAs. It’s also worth pondering whether synthetic modified mRNAs, about which I’ve blogged, would function better when having 2’-OMe in such sites.

Discovery of N3-methylcytidine (m3C) in Mammalian mRNA

In September 2017, an international team reported compelling evidence for the discovery of m3C in mRNA from mice and humans, which previously had only been found in tRNAs. This discovery by Xu et al. is well worth reading for full appreciation as a tour de force of experimental methodology; however, a brief synopsis of key steps is as follows.

The team investigated three mammalian methyltransferases—METTL2, METTL6 and METTL8—the genes for which were each knocked out in mice and in human cell lines using CRISPR/Cas9 (about which I’ve previously blogged multiple times). Using liquid chromatography triple quadrupole mass spectrometry (LC-MS/MS), they quantified m3C in tRNA fractions in brain and liver tissues from wild-type mice and knockout mutants. The results showed that METTL2 and METTL6 deficiencies led to a 35 and 12% reduction in m3C levels in tRNA, respectively. In contrast to these results, METTL8 deficiency did not produce a significant change in tRNA m3C levels. Instead, Xu et al. provide definitive proof through stringent purification of mRNA and quantification by LC-MS/MS that METTL8 acts on mRNA.

Specifically, tRNA was first removed from the total RNA sample by size-exclusion chromatography. The rest of the RNA fraction was subsequently subjected to poly(A) enrichment and rRNA depletion. mRNA extracted from mutant mice with a deficiency of METTL8 showed lower levels of m3C, with no noticeable changes observed for the m3C levels in tRNA.

Commenting on this work by Xu et al., experts Liu & He pointed out that there are “several immediate angles to explore.” Specifically, the locations of the METTL8-installed m3C modifications remain unknown. Furthermore, the functional roles of METTL8 and the m3C modifications, and whether these vary under different stress conditions or during cellular signaling remain to be elucidated. They add that “[i]dentifying and deciphering the roles of RNA modifications is more than just a biochemical treasure hunt: Defects of certain RNA-modifying enzymes are known to be associated with human diseases. Moving beyond the abundant RNAs to…mRNA and long noncoding RNA, coupled with the discoveries of chemical modifications such as…m3C methylations, is opening new directions in understanding RNA modification-mediated RNA processing and gene expression regulation.”

Taken from brittanica.com

In my humble opinion, the number of chemical modifications in mRNA has indeed increased quickly during the past decade, which adds to our factual data for RNA epigenetics while continuing to challenge our understanding of this exciting field through further experimentation. In this regard, I’m reminded of the following quote attributed to the ancient Greek philosopher and scientist Aristotle:

“The more you know, the more you know you don’t know.” 

Addendum:  Stepwise details for Nm-seq methodology

Nm-seq first exposes internal 2’-OMe (“Nm”) sites in RNA fragments (Step I below) by iterative oxidation–elimination–dephosphorylation (OED) cycles that remove unmodified 2′-OH nucleotides (one per cycle) in the 3′-to-5′ direction. Vicinal diols in these nucleotides are readily oxidized by sodium periodate to yield a dialdehyde intermediate that undergoes spontaneous β-elimination under mildly basic conditions. With the removal of a nucleoside, the resulting 3′-monophosphate is enzymatically dephosphorylated to allow another cycle to take place. Once a 2’-OMe is encountered, the progressive shortening process comes to a halt, as lack of vicinal diols prevents oxidation (Step II below).

The net result of the iterative exposure process is an enrichment of fragments ending with 2’-OMe, although those ending with 2’-OH still constitute the majority of 3′ termini. A final round of oxidation–elimination (OE) reaction, excluding dephosphorylation, is then performed to generate two types of 3′ ends that differ in their ligation compatibility (Step III below). While unmodified 2′-OH ends produce an unligatable 3′-monophosphate, 2’-OMe ends are resistant to oxidation and retain their ligatable 3′-OH group. In this way, fragments ending with 2’-OMe are preferentially ligated to the 3′ adaptor (note shown below) and further enriched by PCR amplification.

Interested readers should consult the Nm-seq publication by He & coworkers for further details on how this cyclic biochemistry, ligation, and amplification lead to libraries for sequencing and decipherable patterns indicating 2’-OMe positions.

Taken from He & coworkers Nature Methods (2017)

Legionnaires’ Disease Outbreak in New York

  • First Identified as a New Pathogen 40 Years Ago, Legionella Persists
  • Legionella’s Life Cycle Involves “Biological Sanctuaries”
  • qPCR Proven to Outperform Antibody-Based Detection of Legionella

When I read about an outbreak of Legionnaires’ disease (LD) in New York City, baseball legend Yogi Berra’s famous quote, “It’s déjà vu all over again” immediately came to mind, along with the irony of Berra playing for the New York Yankees. So, if you’re much younger than me, you’ll likely not know why “It’s déjà vu all over again” and you may wonder who Berra was. You can read about him later elsewhere, but for now you should read on to learn about Legionnaires disease and why déjà vu is apropos.

History of LD

Notable positive events during 1976 in the United States included our Bicentennial Celebration, unveiling by NASA of the first space shuttle (the Enterprise), establishment of Apple Computer Company by Steve Jobs and Steve Wozniak, and Silly Love Songs by Paul McCartney and Wings ascending to #1 on the charts. While many of these events were the beginning of fabulous things to come, one proved to be the beginning of something catastrophic. American Legionnaires who gathered in Philadelphia, Pennsylvania for the Bicentennial were struck with a mysterious epidemic of fatal respiratory disease.

Taken from networks.org

Sadly, 182 members of the Pennsylvania American Legion were affected, and 29 individuals died after they returned from the convention in Philadelphia. The epidemiological and microbiological studies continued for months before scientists began to understand what had happened. Much of the basic framework of our knowledge of Legionnaires disease, as the epidemic came to be known, was developed by a team from the CDC and the Pennsylvania Department of Health, as detailed elsewhere.

Taken from case1study.wikispaces.com

The cause of the disease remained a mystery until 1977 when an investigative team led by J. E. McDade and C. C. Shepard (of the Leprosy and Rickettsia Branch, Virology Division, Bureau of Laboratories, CDC) reported on the isolation of a Gram-negative bacillus found in patient samples. As often done for naming pathogens after sources, the genus of this rod-shaped bacterium was aptly named Legionella. Legionella includes the species L. pneumophila, which caused the pneumonia-like illness medically named legionellosis, but commonly referred to as LD.

2017 LD Outbreak Hits New York City—Again

In June of this year, forty years after the first characterization of Legionella, it’s lethal infectivity reoccurred in an outbreak in the Upper East Side of the Manhattan borough of New York City, leaving one person dead and six other people sickened. According to a newspaper account, this outbreak occurred within 11 days, and may have been triggered by contacting contaminated water as has happened in other cases.

While this incident affected relatively few people compared to other previous outbreaks, including one in the Bronx borough of New York City in 2015 that killed 15 people and sickened more than 70, it’s a scary reminder of the persistence of Legionella in the environment. In this regard, it has been reported that 200 to 400 cases of the illness are recorded each year in New York, despite the monitoring of 6,000 water systems wherein Legionella can flourish in warm conditions. This environmental factor provides a segue into what genomic sequencing has revealed about Legionella.

Genomics-Based Insights on Legionella

The bacterial pathogen L. pneumophila is found ubiquitously in fresh water environments where it replicates within protozoan hosts. When inhaled by humans it can replicate within alveolar macrophages and cause severe pneumonia associated with Legionnaires disease. As detailed elsewhere, recent advances in genome sequencing has had a major impact on understanding of the pathogenesis, evolution and genomic diversity of Legionella.

A lipopolysaccharide cell wall and several outer membrane proteins are essential virulence factors. Central to the pathogenesis of L. pneumophila is its Type IV secretion system, which translocates over 270 effector proteins into the host cell, thus allowing this bacterium to manipulate host cell functions to its advantage and assures intracellular survival and replication.

Within aquatic media, as depicted below, Legionella exist as part of biofilms, which provide a protective environment—or biological sanctuary, if you will—wherein the bacteria exhibit marked increase in resistance to biocidal compounds and chlorination. Aside from the resultant difficulty of purging water systems to be free of Legionella, these bacteria can invade and multiply within protozoa (which are ubiquitous and include amoeba), thus providing yet another biological sanctuary. Protozoa are present in all aquatic or moist environments, and can be found in even the most inhospitable parts of the biosphere, thus providing further protection to Legionella.

Taken from Comas Nature Genetics (2016)

The actual infectious particle is not known but may include excreted legionellae-filled vesicles, intact legionellae-filled amoebae or free legionellae that have lysed their host cell. Transmission to humans occurs via mechanical means, such as air-conditioning units, taps and showerheads, as well as others listed by the World Health Organization (WHO).

Infection in humans occurs by inhalation of the infectious particle and establishment of infection in the lungs. After ingestion by macrophages, L. pneumophila have been found to inhibit acidification and maturation of its phagosome. Following a 6–10 hour lag period, the bacteria replicate for 10–14 hours until macrophage lysis releases dozens of L. pneumophila progeny.

It’s worth noting that, according to WHO, there is no direct human-to-human transmission of Legionella, which in my opinion is why incidence of LD remains relatively low.

Gardening Can Be Bad for Your Health—No Joke.

Unfortunately, there are other ways of contacting LD besides ingestion of tainted water. At the risk of sounding flippant, gardening can be seriously bad for your health because of contracting LD by breathing in aerosolized Legionella from contaminated soil. This is especially true in New Zealand, which has the highest incidence of LD in the world, according to a recent publication, with L. longbeachae being the most clinically relevant species. This infectious agent is predominantly found in soil and composted plant material. Most cases occur over spring and summer, and the people at greatest risk are those involved in gardening activities.

Taken from lawrieco.com.au

Some agricultural experts advocate smelling soil to assess its quality, stating that “[t]he smell of a soil can often reveal its state of health, sweet or offensive or plain bland,” and adding “the smell does not actually come from the dirt itself, but from soil microbes that inhabit a healthy soil environment. Sweet smelling soil has good levels of organic carbon which is vital to supporting the world of billions of beneficial bacteria and fungi in every cup of healthy soil.”

These soil sniffing experts, however, fail to consider the presence of pathogenic organisms including L. longbeachae. I, for one, will carefully avoid purposefully smelling any soil when gardening, and will instead be sure to wear a good mask capable of filtering out aerosolized Legionella, as you should too!

Nucleic Acid-Based Detection of Legionella

Rapid and effective diagnosis of LD is extremely important so that timely and appropriate therapy can be provided, thereby lowering the morbidity and mortality rates and reducing the health and economic costs associated with this disease. Surprisingly, diagnosis is reportedly established solely by time-consuming microbiological tests. Luckily, it looks like testing procedures could soon change for the better, thanks to PCR and NGS.

Taken from corisbio .com

Earlier this year, Christovam et al. assessed the accuracy of various detection tests in patients suspected of being infected with Legionella and in patients with laboratory-confirmed LD. Investigators analyzed urinary Legionella antigen detection, direct fluorescent antibody (DFA) staining, serological testing and PCR vs. culture analysis (the reference standard). The sensitivity and specificity for PCR were 83 % and 90 %, respectively, whereas DFA sensitivity and specificity were 67 % and 100 %, respectively. Moreover, PCR had high sensitivity and specificity for early diagnosis of LD.

Taken from letsfixit.co.uk

While the study results reported by Christovan seem promising, less definitive results have been reported. Krøjgaard et al., who compared culture and qPCR assays for the detection of Legionella in 84 samples from shower hoses and taps in a residential area before and after two decontaminations. Detection by qPCR was suitable for monitoring changes in the concentration of Legionella but the precise determination of bacteria is difficult. Risk assessment by qPCR only on samples without any background information regarding treatment, timing, etc. was said to be “dubious.” However, the rapid detection of high concentrations of Legionella by qPCR was said to be valuable as an indicator of risk, although it may be false positive compared to culture results. Detection of a low number of bacteria by qPCR was said to be a strong indication for the absence of risk.

Not surprisingly, the advent of powerful next-generation sequencing (NGS) is emerging as a better method for genus-specific, sensitive and quantitative determination of Legionella. In 2017, Pereira et al. reported findings from a study using NGS to differentiate 20 pathogenic strains of Legionella in fresh water systems. A genome standard and a mock community consisting of six different Legionella species demonstrated that the reported NGS approach was quantitative and specific at the level of individual species, including L. pneumophila. Comparison of quantification by real-time PCR showed consistency with the NGS data, thus indicating that NGS “provides a new molecular surveillance tool to monitor all Legionella species in qualitative and quantitative terms if a spiked-in genome standard is used to calibrate the method.”

Concluding Comments

Aside from providing a brief introduction and update on LD, my additional intent was to alert readers—without undue alarm—to the myriad circumstances in which Legionella can infect humans. According to the aforementioned list provided by WHO, the most common form of transmission of Legionella is inhalation of contaminated aerosols produced in conjunction with water sprays, jets or mists. Infection can also occur by aspiration of contaminated water or ice, particularly in susceptible hospital patients.

Researching transmission of Legionella in Google Scholar led me to find additional information (see links below) that you may find useful or interesting.

Thankfully, as I’ve said before, Legionella is not transmitted human-to-human. The scary aspect of Legionella, however, is that it’s continually mutating, which raises the specter of emergence of a strain that can spread within a human population. Let’s hope that this doesn’t happen and/or that modified mRNA vaccines can be quickly produced to combat that possibility.

As usual, your comments are welcomed.


After finishing this blog, there was a Reuters news report on October 9, 2017 that Michigan’s top medical official, Dr. Eden Wells, will be charged with involuntary manslaughter for her role in the city of Flint’s water crisis, which was linked to an outbreak of LD that caused at least 12 deaths. Dr. Eden Wells would become the sixth current or former official to face involuntary manslaughter charges related to this crisis, which principally involved lead contamination in the city’s water supply.


Your Meals, Wines and Much More are Personalized to Your DNA

  • Personalized DNA Sequencing for Lifestyle Guidance is all the Buzz
  • Vita Mojo Provides Meals to Match Your DNA Codes
  • Vinome Promises to Find Wines for Your Unique Palette
  • SlumberType Claims to Analyse Your DNA and Help You Sleep Better

Among the most significant trends in nucleic acids-based R&D these days is personalized medicine, which uses a person’s DNA sequence (or RNA expression profile) to guide the selection of the best available therapy for that person. This approach is opposite to the traditional strategy for drug development leading to “one treatment regimen for all patients.” Thus, as depicted below, major medical facilities now offer patients personalized cancer therapy based on molecular profiling that features analysis of each patient’s DNA and/or RNA markers.

Taken from pct.mdanderson.org

Scientific studies supporting advantages of nucleic acids-based personalized therapies tailored or customized for each individual are quite compelling, and definitely on the rise, based on my PubMed search results. Given this situation for medical therapies, you might wonder whether personalized nucleic acids-based strategies can be extended to other aspects of human biology, perhaps even to what each of us should eat or drink. Well, such wondering has already been done by others, and has recently led to genetic analyses moving from what I describe as “medicine to mainstream,” as you’ll now read.

Your Meals Personalised to Your DNA

Taken from foodsyoucan.co.uk

A chain of cafes in London is now catering to your body’s every whim—right down to its genetic makeup. Vita Mojo is the first in the world to create meals based on a customer’s DNA profile. It’s an avant-garde part of a huge trend for wellness and healthy eating, an industry worth $3.72 trillion—yes, trillion—across the world, according to a report by the Global Wellness Institute.

Vita Mojo customers first arrange to have their DNA analyzed, and then receive a profile of DNA markers which indicate food groups they should avoid and food groups to eat more of. The gene testing service provided by DNAFit, which was founded in 2013, costs $199 and works like this:

  • You receive a DNA collection kit from DNAFit to provide a saliva sample that is mailed to a company called Helix, which uses Illumina’s Exome+ assay to sequence all 22,000 protein-coding genes, as well as generating additional relevant genomic information.
  • DNAFit interprets your genetic data in terms of fitness or nutrition insights, to help you discover more about yourself and make better informed decisions about your wellness.
  • DNAFit provides you with actionable information about your genetics related to:
    • carbohydrate and fat response
    • lactose tolerance
    • genetic detoxification
    • anti-oxidant and omega-3 needs
    • vitamin B and vitamin D needs
    • alcohol and caffeine response

Most importantly, you also receive a 12-week personalised meal plan, recipes and shopping list. I assume this is the information used by Vita Mojo to provide you with your personalized meals. Vita Mojo, which is backed by the $7 billion French catering company Elior, is reportedly in the midst of raising more capital by crowd-funding, about which I have previously blogged. Helix located in San Carlos, California, lists Illumina among its investors in a deal reported elsewhere. Helix also lists numerous corporate partners, which segues into the next section featuring one of these partners with the clever name Vinome derived from vino (wine) and genome.

Your Wine Personalized to Your DNA

Taken from trendwatching.com

Truth be told, I first came across Vinome not via its partnership with Helix but in a Tweet that caught my attention because I enjoy wine, and follow new applications of sequencing. I was most curious about how drinking wine and sequencing were now being linked. In any case, the Tweet about Vinome led me to do some research about this new startup company that offers to personalize your wine drinking experience through sequencing your DNA, and is appropriately located in Healdsburg, which is in the heart of California’s premier wine country.

Here’s how it works:

  • Vinome cost $109.99, which includes $80 for the Helix test (sequencing of a saliva sample), and $29.99 for your Vinome Profile
  • Vinome analyzes 10 genetic markers related to smell and taste
  • The company then combines these DNA markers with your stated taste preferences to reveal your “vinome,” which is defined by the company as “your unique wine preference profile”
  • You can then join the Vinome wine club or shop its wine store to receive “boutique bottles specifically catered to your vinome results”

Vinome states that over 500 volunteers had their DNA sequenced, and then participated in blinded tasting of a spectrum of wines, answering questions about how much they liked the wines and what flavors they could taste. This data was then compared to DNA genetic markers reported in the scientific literature as important for taste and smell. The participants also answered a detailed questionnaire about their taste preferences for various foods and beverages. Vinome then developed an algorithm that combines this genetic data with taste preference information to deliver its personalized wine recommendations. Vinome works with about 50 boutique wineries and offers bottles generally priced from $18 to $50.

Not everyone, however, has hopped on the consumer sequencing band wagon. One media piece quotes a university professor and researcher as saying that “[i]t’s just completely silly. Their motto of ‘A little science and a lot of fun’ would be more accurately put as ‘No science and a lot of fun.’” Personally, I wouldn’t go so far as to say no science, but I do agree that much more correlative genetic and tasting data needs to be obtained to substantiate Vinome’s claims. Interested readers can later consult an entertaining—to me—account in Business Insider written Lydia Ramsey and titled I took a DNA test that claims to reveal the best wine for you — here’s the verdict.

Your Lifestyle Personalized to Your DNA

While researching the above stories, I happened to receive an email advertisement about Exploragen, which was described as “a new DNA lifestyle company to deliver useful DNA-based apps directly to consumers.” It went on to say that the apps from this California startup, which also utilizes the Helix platform, represent “the first online marketplace of DNA-powered consumer products [to] monitor sleep patterns and caffeine metabolism, optimize fitness, personalize cosmetics and much more.”

Taken from livescience.com

If you check out Exploragen’s website, as I did, you’ll find that the latter statements are somewhat misleading because the only app currently available is called SlumberType. This app is stated to “improve your nights and change how you feel during the day” by discovering how your DNA influences your sleep habits such as the quality of sleep and how long it takes you to fall asleep and stay asleep. SlumberType also promises to help you ‘find out how your sleep DNA relates to your diet, productivity, exercise, and caffeine consumption’.

I found some details about how SlumberType works buried in the FAQs on the website. SlumberType checks genetic variants that have been shown to be associated with sleep traits, including how long it takes you to fall asleep, how long you stay asleep, the quality of your sleep, and your genetic similarity to self-reported “morning” or “evening” persons. The SlumberType app is said to make it “simple for you to record your sleep/wake times, your morning and evening mood—plus other factors you choose—with just a few taps.” One important stated caveat is that genetic associations used by this product were originally discovered in European populations and may or may not be applicable to people from a different background.

Visitors to the website are encouraged to stay in touch as more apps are supposedly coming soon. While all of this sounds interesting, and potentially useful for some persons, I’m not convinced there will be enough early adopters to sustain Exploragen’s business model, but that’s just my humble opinion.

Closing Comments

Vinome’s use of genetic markers related to smell and taste led me to research this topic, and in so doing I found a lengthy scientific review article by Reed & Knaapila. This article is well worth a quick read to get a sense—pun intended—of what’s known about genetic markers and our senses.

In a nutshell, it’s a very complicated story because of the complexities and differences in sensory perceptions among individuals. To this point, I found the following hypothetical analysis by Reed & Knaapila to be a good example of how taste and smell genotypes may contribute to different perceptions of the same food (in this case a ham and cheese sandwich containing bread, onion, tomato, watercress, cheese, and ham).

Taken from Reed & Knaapila Prog Mol Biol Transl Sci (2012)

In this hypothetical case, sucrose in the onion will be detected by sweet receptors on the tongue, TAS1R3; glutamate in the tomato (perceived as a savory or umami taste) is sensed by the umami receptor, TAS1R3; bitterness of watercress is due to isothiocyantes detected by bitter receptors, TAS2R38; isovaleric acid in cheese has a “sweaty” odor detected by an olfactory receptor, OR11H7; ham contains androstenone having an odor called boar taint detected by OR7D4.

People with two positive alleles (+/+) perceive the compounds better than people with two negative alleles (−/−). Person 1 can taste the pleasant sweetness of the onion and the umami of the tomato but does not perceive the bitterness of the watercress or the unpleasant odors of the cheese or ham. Thus, Person 1 likes the ham sandwich more than Person 2.

Importantly, in my opinion, Reed & Knaapila note that “[p]eople eat what they like, but they also eat for many other reasons. Simple explanations of the links between sensory perception and food intake are misguided: Just as people do not choose art or music based solely on how well they can hear or see, we do not choose food based solely on the reactions of the tongue or nose. Although genetic differences determine what we can taste and smell (and at what concentration), our taste is ultimately determined by our experiences, learning, and culture, in an artistic sense, as well as in our likes and dislikes of food and drink.”

Bon appétit and à votre santé!

As usual, your comments are welcomed.



Phoenix in Mythology and Sequencing

  • Like a Phoenix, Helicos Sequencing is Being Reborn
  • Direct Genomics in China to Launch the Genocare Clinical Sequencer
  • SeqLL in the USA to Launch Benchtop tSMS Sequencer

A phoenix as depicted by F.J. Bertuch (1747–1822). Taken from Wikipedia.org

In ancient Greek mythology, a phoenix is a bird that is cyclically regenerated or reborn by arising from the ashes of its predecessor, which dies in a show of flames and combustion. In contrast to a phoenix, modern biotech methods generally “die” in utility by being displaced with faster, better, and/or cheaper methods rather than undergoing “rebirth” in the context of a new application. However, a method developed by a company named Helicos (scarily close to Helios associate with a phoenix) may prove to be a rare exception. Perhaps this is destiny, but I digress…

Helicos Sequencing

Successful Sanger sequencing of a human genome in the early 2000s spawned numerous efforts to develop faster, better, and/or cheaper methodology to enable genomic analysis on a routine basis. Among the early contenders there was Helicos BioSciences, which was founded in 2003 by several principals including then—and still—uber-famous Stephen Quake.

Helicos sequencing technology, which is depicted below and outlined elsewhere, was especially attractive because it was “true” single-molecule sequencing (i.e. sample prep did not require prior PCR or other amplification, thus greatly simplifying the workflow). Moreover, the technology uniquely allowed direct RNA sequencing, thus obviating the need to first convert RNA into cDNA.

Main steps for primer(P)-based, single-color (Cy3 dye) Helicos sequencing, in this example using two passes. Taken from Harris et al. Science (2008)

3’-Unblocked reversible terminator. Taken from Chen et al. (2013)

Details for how this sequencing-by-synthesis occurs can read in various proof-of-concept publications. However, it’s worth noting here that the 3’-unblocked reversible terminator nucleotide triphosphate monomers have a cleavable linker attached to a detectable dye. Helicos referred to these as “Virtual Terminator” nucleotides since they are efficiently incorporated by a polymerase yet block incorporation of a second nucleotide on a homopolymer template.

So, with these methodological advantages going for it, why did Helicos file for bankruptcy in 2012? Press coverage at that time stated ‘rough financial sledding and tough competition from rival next-generation sequencing companies.’ In my humble opinion, this lack of commercial success was primarily due to the HeliScope Genetic Analysis System (pictured below) being way too big (think upright freezer-refrigerator), far too expensive ($1,350,000), and its ~35-base reads too short on performance—pun intended.

Two Phoenix-Like Versions of Helicos Sequencing

Fast forwarding about five years from the 2012 bankruptcy filing by Helicos brings us to recent reports of two independent efforts to bring back Helicos sequencing in commercially viable formats and contexts, think Phoenix rising from the ashes.

Jiankui He and the GenoCare sequencer (credit Xinjie Tian). Taken from Cyranoski Nature Biotechnology (2016)

The first of these is led by Jiankui He, Founder/CEO of Direct Genomics in Shenzhen, Guangdong, China, as well as Associate Professor at South University of Science and Technology of China in Shenzhen. He, coincidentally, was a postdoc with Helicos cofounder Stephen Quake, who is reported to lead the scientific advisory board for this new company.

The company’s website homepage states the following:

“Direct Genomics is providing physicians with the first single molecule sequencer built exclusively for the clinic. The technology simplifies genome sequencing by reading individual DNA and RNA molecules directly from patient’s blood or tissue samples, which delivers significant improvement in cost and speed. Together with clinicians, Direct Genomics is making genetics an affordable part of everyday patient care.”

Perusal of scant technical information on the company’s website suggests to me that a smaller sized, TIRF-optics-enabled instrument running Helicos-type sequencing has been developed. A story about Direct Genomics by David Cyranoski in Nature Biotechnology states that $100 “clinical sequencing” is being targeted, with a blood-draw to report turnaround time of 20 hours. A very recent publication I found provides details for resequencing the Escherichia coli genome by the Direct genomics platform named GenoCare.

The company’s website lists the following clinical applications:

  • Non-invasive prenatal testing (NIPT)
  • Tumor diagnosis
  • Early-stage cancer prediction
  • Pre-implantation genetic diagnosis (PGD)

The second Phoenix-like rebirth of Helicos sequencing has been developed by SeqLL, which was co-founded in 2013 by William St. Laurent and Daniel Jones, who previously held various technical positions at Helicos. Statements and a video on SeqLL’s website indicate to me that the sequencing technology is essentially that originally developed and patented by Helicos, which is still trademarked as True Single Molecule Sequencing (tSMS™).

William St. Laurent. /   Daniel Jones. Taken seqll.com

SeqLL has been operating as a tSMS™ service provider, but in October 2016 announced the launch of the tSMS™ System Early Access Program giving researchers access to its new benchtop system “designed to deliver unparalleled quantitative RNA and specialty DNA sequencing results to both academic and industry research partners.” I should add that a big, strong bench is needed given that the physical specs are 30 x 30 x 60 inches and 1,000 pounds! Nevertheless, SeqLL recently announced an SBIG grant for improving its direct RNA sequencing technology, which I think could prove to be a driver for adoption.

In conclusion, I think it’s very interesting to see Helicos sequencing coming back to life, if you will, in not one but two different commercial contexts, both of which will hopefully be successful. This despite current ‘tough competition from rival next-generation sequencing companies,’ as observed in the 2012 bankruptcy story about Helicos mentioned above. First and foremost, among that competition is Oxford Nanopore, which I’ve blogged about previously, and whom offers single-molecule sequencing that seems to me to be faster, better, and cheaper for both DNA and RNA, directly.

As usual, your comments are welcomed.


After this blog was written, it was reported in GenomeWeb that Direct Genomics plans to deliver 50 instruments this year to SinoTech Genomics, a startup based in Shanghai that offers both clinical and research sequencing services. Direct Genomics CEO Jiankui He is quoted as saying that ‘SinoTech Genomics [is] committed to ultimately purchasing 700 GenoCare platforms,’ and that Direct Genomics ‘has the capacity of producing around 1,000 GenoCare instruments per year,’ which would be very impressive based on past operational experience with manufacturing Sanger sequencers at ABI.

The piece went on to report that Direct Genomics also ‘aims to launch GenoCare in the US in September.’ Regarding what’s inside the box, so to speak, ABI veteran Bill Efcavitch, who previously served as chief technology officer of Helicos, is quoted as saying that ‘the main difference between the former Helicos technology and the GenoCare platform is in the hardware. It’s completely different engineering.’ He added, however, that it still uses Helicos’ virtual terminator chemistry.

Ocean ‘Dandruff’ DNA to Better Study Marine Biology

  • DNA Barcoding for all Organisms has Numerous Applications
  • DNA Barcodes from Water Samples Greatly Aide Marine Biologists
  • Aquatic Environmental DNA (eDNA) Proves to be Informative ‘Dandruff’

Human DNA identity analysis is now commonplace methodology that’s frequently featured in newspaper stories, TV crime series, or “who dun it” movies. The same principle (i.e. using a characteristic DNA pattern or signature) applies to identification of all animals, birds, insects, and microbes. Actually, DNA barcoding extends to any organism, whether it is alive or has been dead for hundreds of thousands of years (so long as it’s preserved by fossilization).

Taken from gajitz.com

Marine biologists face a serious challenge with accounting for very diverse forms of marine life that exists in a mindboggling huge volume of water. Consequently, it’s not surprising that analysis of water-borne, marine DNA barcodes—as proxies for going to and counting fish—is rapidly trending in utility and importance. Known formally as environmental DNA (eDNA), the aquatic version has been humorously referred to as ocean ‘dandruff’ by Christopher Jerde of the University of Nevada in Reno (which, ironically, is landlocked and distant from any ocean.) But I digress. Before diving further (pun intended) into ocean dandruff, let’s briefly review the background of DNA barcoding.

DNA Barcodes 101

Prof. Paul Herbert. Taken from uoguelph.ca

In 2003, Prof. Paul Herbert and coworkers in the Department of Zoology at the University of Guelph in Canada published a seminal study titled Barcoding animal life: cytochrome c oxidase subunit 1 (CO1) divergences among closely related species that fundamentally changed the field of taxonomy. In a nutshell, Herbert’s team showed it was feasible to classify millions of species based only on DNA sequence of the mitochondrial gene CO1. In the intervening, relatively short amount of time, there have been thousands of publications dealing with applications and extensions of this concept, which is now recognized to be very powerful and promising albeit with some limitations.

Typically, DNA barcodes are identified by sequencing after PCR amplification of one or more specific genetic loci such as CO1. Following proof that a DNA barcode can differentiate the species of interest, single- or multiplex quantitative PCR (qPCR) can be used to enumerate relative amounts of sample from the field.

The advent of high-throughput sequencing technologies applicable to complex mixtures of individually tagged samples then gave rise to “metabarcoding,” about which interested readers can consult many publications for specific topics.

Craig Venter steers his research yacht, Sorcerer II, under the Sydney Harbour Bridge in his quest to collect microbes from the world’s waters. Photo: Dallas Kilponen. Taken from smh.com.au

BTW, among the many pioneering scientific ventures by uber-famous Craig Venter, is his Global Ocean Sampling Expedition aboard his research yacht, Sorcerer II. The expedition is a quest to unlock the secrets of the oceans by sampling, sequencing and metabarcoding DNA of all (or most) microorganisms living in these waters.

Lest you think this was a well-intended but unproductive journey—some say junket—by Venter and coworkers, here’s a link to peruse 16 resultant publications that I found by searching PubMed. To watch and listen to Venter talk about this work, you can click here for an educational and entertaining—as usual with Venter—TED Talk on Sampling the Ocean’s DNA that’s had over 550,000 views!

Ocean ‘Dandruff’

Now that we’ve covered the basics of DNA barcoding and metabarcoding, let’s turn back to ocean dandruff. Dandruff, simply put, is dead skin cells. Using dandruff as an intended witty metaphor for ocean eDNA is a bit misleading as marine eDNA is comprised of a complex mixture of cellular matter from scales, feces, decomposing tissue, etc. of fish and all other present or past sea creatures. Consequently, the design and specificity of primers for PCR is of paramount importance for obtaining—let alone interpreting—DNA barcodes based on fragment size or sequence.

As reported by Miya et al., monitoring the occurrence of fish species-specific eDNA PCR fragments (~70–300 bp) has traditionally used conventional electrophoretic gel separation and detection. More recently, qPCR using fluorogenic probes has been employed owing to the method’s sensitivity, specificity and potential to quantify the target DNA. For example, it has been possible to accurately estimate the biomass of common carp in a natural freshwater lagoon using qPCR of eDNA concentrations and biomass in aquaria and experimental ponds.

Miya et al. also describe the development of a set of PCR primers for metabarcoding mitochondrial DNA of 880 species of fish. They sampled eDNA from four tanks with known species compositions, prepared dual-indexed libraries and performed paired-end sequencing. Out of the 180 marine fish species contained in the four tanks, they detected 168 species (93.3%) distributed across 59 families and 123 genera. That’s quite an impressive accomplishment.

Ocean Dandruff Case Studies

Since there are so many fish-related applications of DNA barcodes, I’ve selected several recent examples that are indicative of the utility of ocean ‘dandruff’—and are quite interesting, in my opinion. The first case in point exemplifies how eDNA can be used to deal with rare and endangered species, which are either very hard to find or can be dangerously distressed by catching to obtain samples.

Green SturgeonBergman et al. report that a decline in abundance of North American Green Sturgeon located in California’s Central Valley has led to its listing as Threatened under the Federal Endangered Species Act in 2006. While visual surveys of spawning by these Green Sturgeon are effective at monitoring fish densities in concentrated pool habitats, results do not scale well—pun intended. By contrast, eDNA provides a relatively quick, inexpensive tool to efficiently identify and monitor Green Sturgeon DNA.

Taken from mthsecology.wikispaces.com

These investigators concluded that follow-on work based on this first-ever eDNA study of Green Sturgeon has the potential to provide better knowledge of the spatial extent of Green Sturgeon spawning that could help identify previously unknown spawning habitats and discover factors influencing habitat usage, guiding future conservation efforts.

Monterey Bay—The second case study, by Port et al., involves taking stock of the marine mammals and fish in Monterey Bay using eDNA and, importantly, comparing the results obtained to those from traditional dive surveys.

In brief, this team of researchers from several universities and the Monterey Bay Aquarium Research Institute found that eDNA assessments picked up almost all the organisms scuba divers spied underwater—plus many more that human eyes missed. Here’s some detail on how they did this.

At each scuba survey location as well as at sites offshore, ~1 gallon of water was sampled several feet above the bottom. Four types of habitats were sampled: sea grass beds, Monterey Bay’s unique “Kelp Forest,” sandy areas and rocky reefs. Onshore, in a “clean” (DNA-free) lab, these water samples were filtered to collect cells containing eDNA for storage at −80 °C until eDNA extraction at a university clean lab. A vertebrate‐specific primer set targeting a small region of the mitochondrial DNA 12S rRNA gene was used for PCR followed by gel purification.

Researchers collecting water in Monterey Bay for eDNA analysis. Courtesy Jesse Port. Taken from mercurynews.com

After quantification, pooled amplicons (each having a sample index sequence) were paired-end sequenced on the Illumina MiSeq platform using a 20% PhiX spike‐in control to improve the quality of low‐diversity samples. The conclusions are worth quoting because—in my opinion—the findings represent a new era in marine biology based on nucleic acid analysis:

“We find spatial concordance between individual species’ eDNA and visual survey trends, and that eDNA is able to distinguish vertebrate community assemblages from habitats separated by as little as ~60 meters. eDNA reliably detected vertebrates with low false‐negative error rates (1/12 taxa) when compared to the surveys, and revealed cryptic species known to occupy the habitats but overlooked by visual methods. This study also presents an explicit accounting of false negatives and positives in metabarcoding data, which illustrate the influence of gene marker selection, replication, contamination, biases impacting eDNA count data and ecology of target species on eDNA detection rates in an open ecosystem.”

Restated more simply, eDNA analysis of the water picked up 11 of the 12 fish and marine mammals that the divers observed, and—importantly—identified 18 additional animals the divers missed! The efficiency and improvement offered by eDNA analysis compared to traditional seek-and-count methods has been echoed in an editorial I found by Hoffmann et al. titled, tongue-in-cheek, Aquatic biodiversity assessment for the lazy.

Invasive Gobies—The third and final case study deals with detection of invasive, non-native fish to assess whether eDNA can provide a better advanced warning system for detecting these unwanted creatures and implementing eradication steps.

Gobies are an invasive fish species that has colonized freshwaters and brackish waters in Europe and North America. One of them, the round goby (Neogobius melanostomus), pictured below, is among the worst invaders in Europe. Current methods to detect the presence of these gobies are labor intense and not very sensitive. Consequently, populations are usually detected only when they have reached high densities and when management or containment efforts are futile.

Taken from animal.memozee.com

To improve monitoring, Swiss and Canadian collaborators developed an assay based on the detection of eDNA in river water, without detecting any native fish species, which is obviously an important assay criterion. The eDNA assay requires less time, equipment, manpower, skills, and financial resources than conventional monitoring methods such as electrofishing, angling or diving. Samples can be taken by novices and the assay can be performed by any molecular biologist on a conventional PCR machine. Therefore, this assay enables environment managers to map invaded areas independently of fishermen’s reports and fish community monitoring.

I could go on and on with examples of utility and the many advantages provided by eDNA for marine biology, but I’m sure you get the picture. I hope that you agree with me that eDNA analysis is a very valuable type of trending nucleic acid-based methodology.

As usual, your thoughts or comments are welcomed.





Curiously Circular RNA (circRNA) Gets Curiouser

  • circRNA Molecules Have, Oddly, No Beginning or End
  • circRNA Are Now Recognized as Regulators of Gene Expression 
  • A Flurry of New Findings Indicate circRNA Are Also Templates for Synthesis of Proteins Having As Yet Unknown Functions

Electron micrograph of ~3,000-nt circRNA. Taken from Matsumoto et al. PNAS (1990).

About a year ago, my blog titled Curiously Circular RNA pointed out that circular RNA (circRNA) in animals are odd molecules in that, unlike the vast majority of other RNA in animals, circRNA have no structural beginning (5’) or end (3’). This very curious feature has, not surprisingly, stimulated considerable scientific interest in knowing more about these molecules, which were serendipitously discovered some 30 years ago.

Application of next-generation sequencing has revealed that circRNA are actually relatively abundant and evolutionarily conserved, which implicates biological importance rather than inconsequential mistakes during RNA splicing mechanisms. Some circRNA have been shown to have function—circRNA can hybridize to complementary microRNA (miRNA), and thus serve as a kind of ‘sponge’ that influences miRNA-based gene expression. Evidence for circRNA involvement in gene expression continues to grow, as there are now >700 items on “circRNA [and] sponges” in Google Scholar.

Very recently published lines of research (that I’ll outline in what follows) implicate circRNA as coding templates for proteins, which heretofore has been exclusively associated with messenger RNA (mRNA). Current dogma holds that translation of mRNA into protein requires recognition of the 7-methylguanylated (m7G) 5’-cap structure to start ribosome binding, while the 3’-poly(A) tail protects the mRNA molecule from enzymatic degradation and aids in stopping translation, as depicted below.

Taken from Shoemaker & Green Nature Structural & Molecular Biology (2012).

Start and stop structural elements characteristic of mRNA are obviously not present in circRNA, which are literally just circles of RNA. Consequently, finding proteins encoded by circRNA has stirred up controversy about whether such proteins are a new and fundamentally important aspect of genetics or just inconsequential biochemical mistakes.

Translation of circRNA in Fly Head Neurons

Fruit fly. Taken from turbosquid.com

Researchers at The Hebrew University of Jerusalem in Israel in collaboration with a team at Max-Delbruck-Center for Molecular Medicine in Berlin, Germany recently reported in Molecular Cell the first compelling evidence that a subset of circRNA is translated in vivo. The study by Kadener & coworkers was carried out using the common fruit fly (Drosophila melanogaster), which is known to have a number of features that lend to investigations of circRNA: (1) >2,500 fruit fly circular RNAs have been rigorously annotated, (2) these are mostly derive from back-splicing (pictured below) of protein-coding genes, (3) hundreds of which are conserved across multiple Drosophila species, and (4) exhibit commonalities to mammalian circRNA.

Direct back-splicing: a branch point in the 5’ intron attacks the splice donor of the 3’ intron. The 3’ splice donor then completes the back-splice by attacking the 5’ splice acceptor forming a circRNA. Taken from Jeck & Sharpless Nature Biotechnol (2014).

This study by Kadener & coworkers involves a plethora of technically complex experimental procedures and associated jargon, from which I’ve extracted what I believe to be some key points to share. After annotating the Drosophila circRNA open reading frames (cORFs), which, by definition,h have the potential for translation, they searched for evidence of their translation utilizing previously published ribosome footprinting (RFP). This led to identification of 37 circRNAs with at least one specific RFP read, referred to as ribo-circRNAs.

Taken from Jeck & Sharpless Nature Biotechnology (2014)

Several representative ribo-circRNAs were then constructed to each have (pictured below) a metallothionine (MT) promoter and V5 tag to facilitate translation and anti-V5 antibody-based detection of the expected protein after transfection into cells.

To determine whether circRNAs are translated in a more relevant tissue, they set up the RFP methodology in fly heads. A genetic locus named mbl that is known to produce a circRNA (circMbl3) at high abundance was selected for targeted mass spectrometry from a fly head immunoprecipitated MBL. They utilized synthetic peptides to determine characteristic spectra for which to search in the fly head immunoprecipitate and found a consistent and very high confidence hit for a peptide that can only be produced by circMbl3.

Kadener & coworkers extended these fly head findings to mammalian mouse and rat systems, but the most interesting part of this study—in my opinion—dealt with what signals ribosome binding and translation in the absence of the 5’ cap structure present in mRNA. They demonstrated circRNA translation under conditions intended to block normal 5’ cap-dependent translation of mRNA, and concluded that “[untranslated regions] of ribo-circRNAs (cUTRs) allow cap-independent translation [and that] further research is necessary to uncover how these sequences promote translation.”

Remarkably, as you’ll now read, another group of investigators have apparently found how such promotion of circRNA translation can occur.

Translation of circRNA is Driven by N6-Methyladenosine (m6A)

The most abundant modification of RNA in eukaryotes is m6A, which has been recently shown by Li et al. to recruit binding proteins that collectively facilitate the translation of specifically targeted mRNAs—i.e. those “marked” with m6A—through interactions with 40S and 60S ribosome subunit “machinery” that actually carry out translation. Contemporaneously, Yang et al. found that m6A likewise promotes efficient initiation of protein translation from circRNAs in human cells. They discovered that consensus m6A motifs are enriched in circRNAs, and a single m6A site is sufficient to drive translation initiation.

As depicted below, this m6A-driven translation requires initiation factor F4G2 and m6A “reader” YTHDF3. Experiments showed that this translation is enhanced by methyltransferase METTL3/14 and inhibited by demethylase FTO, which enzymatically “add” and “subtract” methyl (Me) groups on specific adenosines (A) in circRNAs, respectively.  It has also been shown to be upregulated upon heat shock, which is a commonly employed method to induce “stress” in cells.

Taken from Yang et al.

Further analyses through polysome profiling, computational prediction and mass spectrometry revealed that m6A-driven translation of circRNAs is widespread, with hundreds of endogenous circRNAs having translation potential. Yang et al. concluded by stating that their “study expands the coding landscape of [the] human transcriptome, and suggests a role of circRNA-derived proteins in cellular responses to environmental stress.”

Zinc Finger Protein in Muscle Cell Development

Finally, and essentially contemporaneously with above mentioned two publications, a third independent investigation reported by Legnini et al. demonstrated selective circRNA downregulation using short-interfering RNAs (siRNAs). These reagents for RNA interference (RNAi) were used in an image-based functional genetic screen of 25 circRNA species, conserved between mouse and human, expression of which are differentially expressed during myogenesis (i.e. formation of muscular tissue) in Duchenne muscular dystrophy myoblasts.

This siRNA/RNAi-based functional analysis provided one interesting case related to zinc finger protein 609 (circ-ZNF609)—a reported miRNA sponge—the phenotype of which could be specifically attributed to the circular form and not to the linear mRNA counterpart. Consistent with the circ-ZNF609 sequence having an ORF, they found that a fraction of circ-ZNF609 RNA is loaded onto polysomes and that, upon puromycin treatment, it shifted to lighter fractions, similar to mRNAs. The coding ability of this circRNA was proved through use of artificial constructs expressing circular tagged transcripts, and by CRISPR/Cas9—the trendy gene editing method about which I’ve already commented multiple times.

Despite all this evidence, Legnini et al. stated that they “have no hints on the molecular activity of the proteins derived from circ-ZNF609 and as to whether they contribute to modulate or control the activity of the counterpart deriving from the linear mRNA.”

In thinking about closing comments about this update in circRNA, I decided to emphasize that investigations in the field of RNA continue to reveal complexities that will require many more years of global attention to unravel and understand. In just the past decade or so we’ve learned about gene regulation by miRNA/siRNA, reclassification of “junk DNA” as encoding a myriad of long noncoding RNA (lncRNA), mRNA regulation by base-modifications, and curious circRNAs that are more than sponges, and likely encode hundreds (if not thousands) of proteins whose functions have yet to be elucidated. Amazing!

What are your thoughts about all of this?

Your comments are welcomed.


After writing this blog, Panda et al. at the National Institute on Aging-Intramural Research Program, National Institutes of Health published a paper titled High-purity circular RNA isolation method (RPAD) reveals vast collection of intronic circRNAs. Here’s a snippet of the abstract which adds to the increasingly curious occurrence of circRNAs that begs, if you will, further research aimed at discovering functions of circRNA-derived proteins.

“Here, we describe a novel method for the isolation of highly pure circRNA populations involving RNase R treatment followed by Polyadenylation and poly(A)+ RNA Depletion (RPAD), which removes linear RNA to near completion. High-throughput sequencing of RNA prepared using RPAD from human cervical carcinoma HeLa cells and mouse C2C12 myoblasts led to two surprising discoveries: (i) many exonic circRNA (EcircRNA) isoforms share an identical backsplice sequence but have different body sizes and sequences, and (ii) thousands of novel intronic circular RNAs (IcircRNAs) are expressed in cells. In sum, isolating high-purity circRNAs using the RPAD method can enable quantitative and qualitative analyses of circRNA types and sequence composition, paving the way for the elucidation of circRNA functions.”

Nanopore Sequencing by Synthesis (Seq-by-Syn)

  • Yet Another Notable Achievement Involving George Church, ‘The Most Interesting Scientist in the World’ 
  • Team of 30 Coauthors Reports Seq-by-Syn with DNA Polymerase-Nanopore Protein Construct on an Integrated Chip
  • Challenging Improvements Needed for Commercial Reality

Prof. George M. Church. Taken from evolutionnews.org

Devotees of my blog will know that I’m prone to word play such as calling myself a “huge” fan of “tiny” nanopores for DNA sequencing, about which I’ve previously opined. They will also recall that I’m an admitted scientific admirer of George Church, who I think is The Most Interesting Scientist in the World.

Having said this, it’s not surprising that I closely follow what’s trending in nanopore sequencing, and also make an attempt to read all of Church’s papers as they get published because they are almost invariably quite interesting, involve “big ideas,” and in some new way are very educational, at least for me. Following are my comments about a recently published paper on nanopore sequencing in venerable Proceedings of the National Academy of Sciences of the United States of America (aka PNAS) wherein Church is the designated corresponding author.


The seminal origins and early history of nanopore sequencing have been recently chronicled and criticized—then clarified—in Nature Biotech in several “To the Editor” items, which collectively provide enlightening insights into who did what when, so to speak. Those of us who are ‘Nanoporati’—a clever term tweeted by Nick Lowman—should definitely read those Nature Biotech items. For now, however, I’ll set the stage, as it were, by echoing a bit of what I’ve posted in the past for nanopores.

Patented but prophetic (i.e. no data) methods for nanopore sequencing DNA is actually a relatively old (~20 year) idea posited by Church and other creative visionaries. On the other hand, nanopore sequencing was first reduced to practice commercially not too long ago by Oxford Nanopore Technologies (ONT). Many years of delay between concept and commercialization was due to the need for gradual evolution of lots of “nanopore-ology” and sequencing biochemistry, as well as developing highly sophisticated electronics and complex algorithms for data analysis.

Nanopore Sequencing-by-Scanning (Seq-by-Scan)

Taken from rsc.org

As depicted below, and as can be best seen in a video, ONT’s commercially available MinION Seq-by-Scan system essentially involves threading a strand of DNA through a protein-based nanopore and converting resultant ionic current fluctuations into nucleotide base sequence.

While there are issues with base-calling accuracy, the remarkably small and readily portable MinION provides fast, real-time sequencing results for a wide variety of applications. These included unique or otherwise compelling Point-of-Care analyses, such as pathogen surveillance, which has been achieved in remote geographical locations and even in outer space aboard the International Space Station, as I’ve previously posted.

Nanopore Seq-by-Syn

In contrast to DNA Seq-by-Scan using a nanopore, which is challenged by pore-based differentiation of similarly sized A, G, C, and T bases, DNA Seq-by-Syn has no such limitation as it uses the DNA as a template for base-by-base (i.e. stepwise) detection of enzymatic synthesis of complementary DNA. Various Seq-by-Syn methods and challenges have been discussed elsewhere, and currently available commercial systems include those from Illumina and PacBio. The former employs nucleotides that are reversible terminators equipped with cleavable fluorescent “tags” on each base. The latter detects fluorescently labeled tags on polyphosphates released upon nucleotide incorporation.

The presently featured DNA Seq-by-Syn publication by Stranges et al., which builds upon two earlier reports cited therein, differs from the above approaches by using nanopore-based detection of mass tags rather than fluorescent tags. In principle, mass tags could afford higher accuracy compared to DNA Seq-by-Scan. However, as will now be explained, achieving improved accuracy is far easier said than done.

The general approach taken to demonstrate proof-of-concept for mass-tagged nanopore DNA Seq-by-Syn is depicted below in simplified cartoon form, but involves a true tour de force—in my opinion—of three key technologies. The first is design and synthesis of the nucleotides with appropriate mass tags, which involves very sophisticated chemistry that is best appreciated by reading detailed, extensive supporting information (SI) for Stranges et al. and SI for an earlier publication by Fuller et al. In a nutshell, these nucleotides have 5’-hexaphosphates linked to relatively large mass tags comprised of complex oligonucleotide structures.

Taken from Stranges et al. PNAS 2016

The second area of technical innovation involves attachment of a single molecule of ϕ29 DNA polymerase to each α-hemolysin (αHL) nanopore in such a manner as to retain its enzyme activity and be positioned such that every released mass tag transits through (i.e., is “captured” by) the nanopore leading to base identification by its current signature. As depicted below in two related representations, each of these heteroheptameric pores is comprised of one modified αHL subunit to which a peptidyl SpyTag moiety is attached, and six unmodified αHL subunits. This allows attachment of one ϕ29 DNA molecule modified with a cognate peptidyl SpyCatcher moiety at a predetermined, time-average distance from the pore.

Taken from Stranges et al. PNAS 2016.

The third key area of innovation deals with insertion of the enzyme-pore conjugate into a lipid bilayer residing on a silanized array (aka chip) of 256 Ag/AgCl electrodes such that there is one functional pore per electrode. Interested readers are encouraged to consult the publication for details, as well as check out related fabrication and methods patents that I found by searching Google Scholar.

Representative Results

The first image shown above depicts what base tag-specific detection would ideally look like if each of the four different bases would have a characteristic current-blockage intensity and persistence. In addition, all pores would ideally function similarly. Not surprisingly, given the stochastic nature of single-molecule systems in general, Stranges et al. found less than ideal behavior.

For example, out of 70 single pores obtained, 25 captured two or more tags, whereas only six of those pores showed detectable captures of all four tagged nucleotides. Data obtained for the pore with the most transitions between tag capture levels (i.e. the best results) is shown below, while results for the other five are given in the SI.

Taken from Stranges et al. PNAS 2016

To quote the authors:

“All four characteristic current levels for the tags and transitions between them can be readily distinguished…Homopolymer sequences in the template, and repeated, high-frequency tag capture events of the same nucleotide in the raw sequencing reads were considered a single base for sequence alignment. We recognized 12 clear sequence transitions in a 20-s period. Out of the 12 base transitions observed in the data, 85% match the template strand, showing that this method can produce results that closely align to the template sequence.” 

Interested readers need to consult and carefully read the SI for Stranges et al. regarding the interpretation of the “repeated, high frequency capture events,” such as that exhibited by C in the above current vs. time plot.

All of the above snippets in aggregate suggest to me that, while this huge amount of work has made progress toward one approach to Seq-by-Syn, many improvements will need to be made before achieving a robust system to successfully compete in the commercial sector.

Authorship, Affiliations, and Acknowledgments

The relatively large team of 30 coauthors listed for Stranges et al. include the following numbers of investigators and affiliations: 1 at Arizona State Univ., 4 at Harvard, 11 at Columbia University, and 14 at Genia Technologies, which is a Santa Clara, CA company that was acquired by Roche in 2014, and is part of Roche Sequencing.

Acknowledgments in Stranges et al. refer to support by Genia and NIH Grant R01 HG007415, which I found was awarded to coauthors George M. Church (Harvard), Jingyue Ju (Columbia), and James J. Russo (Columbia). The end of the abstract of this grant reads as follows:

“The nanopore chips will be enhanced and expanded from the current 260 nanopores to over 125,000 using advanced nanofabrication techniques. We will conduct real-time single molecule Nano-SBS on DNA templates with known sequences to test and optimize the overall system. These research and development efforts will lay the foundation for the production of a commercial single molecule electronic DNA sequencing platform, which will enable routine use of sequencing for medical diagnostics and personalized medicine.”

The conflict of interest statement in Stranges et al. indicates that the technology described therein (called “Nanopore SBS”) has been exclusively licensed by Genia, and that specified coauthors are entitled to royalties through this license. In addition, Church is a member of the Scientific Advisory Board of Genia.

Parting Comments

Long gone are the days when government-funded academic researchers thumbed their noses, if you will, at commercial development. Nowadays almost all academics parlay their government grants into university patents that get licensed to companies, usually with some type of corporate involvement of said academics.

I hasten to add that I’m not implying that NIH-funded academic research being a “seed” for corporate profitability is negative—especially in view of its Small Business Innovative Research (SBIR) program—but rather view it as a paradigm shift for the better, as it allows academic creativity to be harnessed into applications that can hopefully greatly benefit society.

In conclusion, and coming back to George Church, who I highlighted in the introduction to this blog, I must say that he might very well be the academic researcher with the longest list of technology transfer, advisory roles, and founded companies—13 to date—according to a public list that is truly mind boggling, at least to me.

As usual, your comments are welcomed.


After writing this blog, Roche announced on December 15, 2016 that “it has officially notified Pacific Bioscience (PacBio) of its intention to terminate its [2013] agreement and efforts to develop a sequencing instrument for use in the clinical research and clinical market using their Single Molecule, Real-Time (SMRT®) technology,” about which I have commented previously. The announcement went on to say Roche would instead focus on internal development efforts” and “actively pursue multiple technologies and commercial strategies.” A GenomeWeb headline was more specific:  “Roche Will Focus on Genia’s Nanopore Technology for Dx Market After Ending Deal With PacBio.”

On December 30, 2016 it was reported that the University of California (UC) filed a patent suit against the Chief Technology Officer (CTO) at Genia, and Genia Technologies, claiming the CTO produced key inventions during his time at UC that he later assigned to Genia, but which should have automatically been assigned to UC. Stay tuned…

Frightening Fungus Among Us

  • Clinical Alert for Candida auris (C. auris) Issued by CDC
  • US Concerned About C. auris Misidentification and Drug Resistance
  • Sequencing C. auris DNA in Clinical Samples is Preferred for Identification
Strain of C. auris cultured in a petri dish at CDC. Credit Shawn Lockhart, CDC. Taken from foxnews.com

Strain of C. auris cultured in a petri dish at CDC. Credit Shawn Lockhart, CDC. Taken from foxnews.com

When I was a kid and didn’t know better, there was a supposedly funny rhyme that “there’s fungus among us.” While this saying is thankfully passé nowadays, the growing number of infections by a formerly obscure but deadly fungus is frightening. This so-called “superbug” is an antibiotic-resistant fungus called Candida auris (C. auris) that’s worth knowing about, and is the fungal focus of this blog.

First, Some Fungus Facts

Fungi are so distinct from plants and animals that they were allotted a biological ‘kingdom’ of their own in classification of life on earth, although that was only relatively recently, i.e. 1969. There are 99,000 know fungi, which exist in a wide diversity of sizes, shapes and complexity that extends from relatively simple unicellular microorganisms, such as yeasts and molds, to much more complex multicellular fungi, such as mushrooms and truffles.

It was previously thought that genomes of all fungi are derived from the genome of the model fungus Saccharomyces cerevisae, which has been used in winemaking, baking and brewing since ancient times. However, genome sequencing of more than 170 fungal species has revealed that, while the genome size of S. cerevisae is only ~12 Mb, seven species of fungus have genome sizes larger than 100 Mb. This is attributed to various evolutionary pressure-factors generating transposable elements, short sequence repeats, microsatellites, and genome duplication, and noncoding DNA.

Fungal cell walls are made up of intertwined fibers mostly comprised of long chains of chitosan, the same tough compound found in the exoskeletons of animals such as spiders, beetles and lobsters. The chitin in fungal cells is entangled with glucans and other wall components, such as proteins, forming a mass that protects the cell membrane behind it—and posing a formidable barrier against antifungal drugs.

Taken from Wikipedia.org

Taken from Wikipedia.org

In researching whether there are any nucleic acid drugs against fungi, I found one early patent by Isis (now Ionis) Pharmaceuticals for use of antisense phosphorothioate-modified oligonucleotides for the treatment of Candida infections, but virtually no other reports. I suspect that will change in the future as pathogenic fungi and other disease-causing microbes become more resistant to conventional drugs.

Fungal infections of the skin are very common and include athlete’s foot, jock itch, ringworm, and yeast infections. While these can usually be readily treated, infections caused by pathogenic fungi have reportedly risen drastically over the past few decades. Moreover, with the increase in the number of immunocompromised (burn, organ transplant, chemotherapy, HIV) patients, fungal infections have led to alarming mortality rates due to ever increasing phenomenon of multidrug resistance.

Segue to a Serious Situation

Emergence of drug-resistant fungi is, in part, the segue to the serious story of the present blog. The other part being incorrect identification of a certain fungus as being a common candida yeast, which is not only scary but seemingly inexcusable in today’s era of highly accurate PCR-based assays to accurately identify microorganisms. Here’s the situation in a nutshell.

  1. auris infection, which is associated with high mortality and is often resistant to multiple antifungal drugs, was first described in 2009 in Japan but has since been reported in countries throughout the world. Unlike many Candida infections, C auris is a hospital-acquired infection that is contracted from the environment or staff of a healthcare facility, and it can spread very quickly.

To determine whether C. auris is present in the United States and to prepare for the possibility of transmission, the Centers for Disease Control (CDC) and Prevention issued a clinical alert in June 2016 requesting that C. auris cases be reported.

(A) MALDI-TOF schematic; (B) mass spectra from three C. parapsilosis; and (C) two C. bracarensis isolates. Taken from researchgate

(A) MALDI-TOF schematic; (B) mass spectra from three C. parapsilosis; and (C) two C. bracarensis isolates. Taken from researchgate

This official alarm bell, if you will, was triggered by the following facts:

  • Many isolates are resistant to all three major classes of antifungal medications, a feature not found in other clinically relevant Candida
  • auris identification requires specialized methods such as a MALDI-TOF mass spectrometry or sequencing the 28s ribosomal DNA, as pictured below.
  • Using common methods, auris is often misidentified as other yeasts, which could lead to inappropriate treatments.

The CDC subsequently found that seven cases were identified in Illinois, Maryland, New York and New Jersey. Five of seven isolates were either misidentified initially as C. haemulonii or not identified beyond being Candida. Five of seven isolates were resistant to fluconazole; one of these isolates was resistant to amphotericin B, and another isolate was resistant to echinocandins. While no isolate was resistant to all three classes of antifungal medications, emergence of a new strain of C. auris that is would pose a serious public health issue.

Sequencing 28s ribosomal DNA. Taken from microbiologiaysalud.org

Sequencing 28s ribosomal DNA. Taken from microbiologiaysalud.org

Based on currently available information, the CDC concluded that these cases of C. auris were acquired in the U.S., and several findings suggest that transmission occurred:

  • First, whole-genome sequencing results demonstrate that isolates from patients admitted to the same hospital in New Jersey were nearly identical, as were isolates from patients admitted to the same Illinois hospital.
  • Second, patients were colonized with auris on their skin and other body sites weeks to months after their initial infection, which could present opportunities for contamination of the health care environment.
  • Third, auris was isolated from samples taken from multiple surfaces in one patient’s health care environment, which further suggests that spread within health care settings is possible.

A related Fox News story adds that C. auris was found on a patient’s mattress, bedside table, bed rail, chair, and windowsill. Yikes!

While the above situation in the U.S. might not seem particularly worrisome to you, the potential for emergence of more infectious C. auris strains with higher lethality should be of concern. That has already reportedly occurred in several Asian countries and South Africa. Obviously, deployment of the best available methods for pathogen identification can, in principle, lessen the likelihood of the emergence and/or spread of C. auris in the U.S. and other countries.

Case for Point-of-Care C. auris Nanopore Sequencing?

Taken from extremtech.com 

Taken from extremtech.com

Regular readers of my previous blogs know that I’m an enthusiastic fan of the Oxford Nanopore Technologies minION sequencer, which is proving to be quite useful for characterizing pathogens in very remote regions on Earth—and even on the International Space Station to diagnose astronaut infections! Notwithstanding various current limitations for minION sequencing of microbes, it seems to me that it would be relatively straightforward to generate minION data for many available samples of pathogenic fungi and genetically related microbes to assess the feasibility using minION for faster, cheaper, better unambiguous identification of C. auris minION in centralized or Point-of-Care applications.

Taken from rnaseq.com

Taken from rnaseq.com

If you think this suggestion is farfetched, think again, after checking out these 2016 publications using minION:

The 51.4-Mb genome sequence of Calonectria pseudonaviculata for fungal plant pathogen diagnosis was obtain using minION.

The first report of the ~54 Mb eukaryotic genome sequence of Rhizoctonia solani, an important pathogenic fungal species of maize, was derived using minION.

Sequence data is generated in ~3.5 hours, and bacteria, viruses and fungi present in the sample of marijuana are classified to subspecies and strain level in a quantitative manner, without prior knowledge of the sample composition.

CDC on C. auris Status and FAQs

In the interest of concluding this blog with the most up-to-date and authoritative information, I consulted the CDC website and found statements and replies to FAQs that are well worth reading at this link.

As a scientist, my overriding question concerns the lack of adoption of improved microbiological methods by hospitals and clinics. The above noted misidentifications of C. auris infections resulting from use of flawed lab analyses seems unacceptable. Although I don’t know all the facts or statistics to generalize, I suspect that there are other incorrect lab analyses due to use of outdated methods. On the other hand, I’m hopeful that, with the FDA’s widely touted Strategic Plan for Moving Regulatory Science into the 21st Century, the section entitled Ensure FDA Readiness to Evaluate Innovative Emerging Technologies—think nanopore sequencing—becomes actionable, sooner rather than later.

Changing established—dare I say entrenched—clinical lab tests is not simple or easy, but if it doesn’t begin it won’t happen, about which I’m quite certain. I can only wonder why development of infectious disease analytical methods and treatments seem to require a crisis. Sadly, I think it boils down to the complexities and socio-political dynamics of who pays.

Frankly, it’s my personal opinion that maybe it’s time Thomas Jefferson’s philosophy about hammering guns into plows is directed to health care.


After writing this blog, I learned that T2 Biosystems has received FDA approval to market in the U.S. the first direct blood test for detection of five yeast pathogens that cause bloodstream infections: Candida albicans and/or Candida tropicalis, Candida parapsilosis, Candida glabrata and/or Candida krusei.

Yeast bloodstream infections are a type of fungal infection that can lead to severe complications and even death if not treated rapidly. Traditional methods of detecting yeast pathogens in the bloodstream can require up to six days, and even more time to identify the specific type of yeast present. The T2Candida Panel and T2Dx Instrument (T2Candida) can identify these five common yeast pathogens from a single blood specimen within 3-5 hours.

T2Candida incorporates technologies that break the yeast cells apart, releasing the DNA for PCR amplification for detection by greatly simplified, miniaturized nuclear magnetic resonance (NMR) technology, as can be seen in this video.

In my opinion, this fascinating new technology is another example of what could be rapidly deployed toward detecting C. auris.

Sequencing Trifecta for Top 10 Innovations of 2015

  • Sequencing Sweeps The Scientist’s Top 3
  • Diverse Array of Research and Diagnostic Products Round Out Top 10
  • I Predict 3 Winners for 2016. What Are Yours?
Taken from the-scientist.com.

Taken from the-scientist.com.

Welcome to my first blog of the New Year, 2016! There is a trove of topics in my queue of blogs, and I invite you to check them out every other Tuesday throughout the year. As in the past, this first blog of the year comments on the Top 10 Innovations in 2015 that were picked by a panel of judges and published last month in The Scientist. As a side note, you can also peruse TriLink’s top products of 2015 and predictions for 2016 by clicking here.

When you read about these winners, you’ll find out that 1st, 2nd and 3rd place involve sequencing—a trifecta in parimutuel betting on horse races—that were kind of a sure thing (to continue my analogy to betting) based on sequencing products also being in the top spots in the previous year picks. This preeminence of sequencing will likely continue, as I’ll explain at the end of this blog with my win, show and place bets for next year.

Taken from wikipedia.org.

From wikipedia.org.

Continue reading

Big, Bigger, Biggest—Genomics Projects Go Democratic

  • 1,000 Genomes Project is Big
  • 10,000 Genomes Project is Bigger
  • 100,000 Genomes Project is Biggest—so Far
  • Will 1,000,000 Genomes be Next?

This blog on genomics projects going democratic has—rest assured—nothing to do with US presidential election politics that are already receiving (too much) 24/7 coverage—but rather genomics going from singular to pluralistic. Let me frame this revolutionary change another way to clarify: the much heralded sequencing of “the human genome” (singular) announced in 2001—by competing public and private initiatives—used mixtures of DNA from multiple donors, i.e. “the genome” was actually “the genomes,” all of which are different—in some way. These differences are what make each of us genetically unique. Consequently—and enabled by ever faster and cheaper DNA sequencing—there are increasingly large projects aimed at identifying these genetic variations (aka genotypes or polymorphisms) for association with health or disease status (aka phenotypes). To me, this fundamentally important trending science is definitely blogworthy.

Populations are comprised of genetically unique individuals. Taken from my Wakulla.com.

Populations are comprised of genetically unique individuals. Taken from my Wakulla.com.

Continue reading