Genotyping of Transcriptomes: A Breakthrough Technology in Single-Cell Analysis

Erik Nelson and Anne-Marie Silvi​
July 3, 2019

Though a patient’s cancer is treated as a single disease, it is made up of a diverse collection of cells that are related but may have distinct characteristics. When mutations occur in cells and are passed down to daughter cells, new subclonal populations are created in the bulk cancer population. Each subclonal population has a distinct mutational profile. In order to fully understand each patient’s cancer, we require the ability to profile the properties of each individual cell.

Fulfilling a Technological Need

In recent years, technological advances in sequencing and computational approaches have allowed scientists to understand cancer at a deeper level, with each new advancement improving upon this further. Efforts are ongoing to profile multiple molecular characteristics of cancer cells by combining assays that measure various properties of the cell. A popular approach is to assess the mutations present as well as the gene expression pattern. Profiling any characteristics in the bulk cancer population provides valuable information, but recent developments have allowed for the profiling of individual cells – a much richer source of information than profiling of the whole population of cells. However, there are still challenges in designing powerful enough approaches to profile complex systems such as those seen in blood cancer patients. For instance, since mutations that occur in protein coding regions get passed on to the mRNA (which is translated into mutant proteins that help give the cancer cell its identity), investigators should be able to infer these mutations by sequencing just the mRNA. However, many current techniques only identify the sequences close to the mRNA transcript end and therefore require sequencing of the genomic DNA as well in order to identify both the mRNA identities as well as the mutations in the genome.

Dan Landau, Assistant Professor of Medicine at Weill Cornell Medicine and Core Member, New York Genome Center leads a team that has developed an approach called Genotyping of Transcriptomes (GoT) that addresses these challenges. Co-supervised by Peter Smibert of the Technology Innovation Lab at the New York Genome Center, the team modified existing technologies to provide an improved method of genotyping (identifying mutations) from mRNA analysis that additionally functions to identify and quantify mRNAs within the same cell. Anna Nam, Kyu-Tae Kim, and Ronan Chaligne are co-first authors on the paper describing this new method that was published online today in Nature.

 

The article describes the development of a novel technology – Genotyping of Transcriptomes (or GoT) – which enables to study the impact of somatic mutations (indicated by blue glow) in primary human cells, in the context of their native cell identity (<em>fingerprints</em>). GoT links single-cell somatic genotyping with droplet-based single-cell RNA-seq in thousands of cells by unique barcodes (indicated by different colors on the tips of the fingerprint DNA). Notably, the fingerprints represented in the illustration are the actual fingerprints of the three first co-authors of the manuscript. – Dan Landau < /br><em>Artwork by Hratch Arbach.

GoT enables the study of the impact of somatic mutations (indicated by blue glow) in primary human cells, in the context of their native cell identity ("fingerprints"). GoT links single-cell somatic genotyping with droplet-based single-cell RNA sequencing in thousands of cells by unique barcodes (indicated by different colors on the tips of the fingerprint DNA). Notably, the fingerprints represented in this illustration by Hratch Arbach are the actual fingerprints of the three first co-authors of the manuscript.

 

Testing the Accuracy of GoT

To test GoT, Dr. Landau’s group studied myeloproliferative neoplasms (MPN) – blood disorders defined by alterations in the production of blood cells. Two of the types of MPNs are essential thrombocythemia (ET) and myelofibrosis (MF). ET is characterized by the production of too many platelets, while MF is characterized by a broader range of blood cell abnormalities and fibrotic scarring of the bone marrow. Megakaryocytes, blood cells that produce the platelets needed for blood clotting, play a role in both diseases. The excess platelets found in ET are produced from megakaryocytes, and the fibrotic scarring in MF is due in part to abnormal signaling from megakaryocytes. Patients with MPNs have a propensity to develop secondary AML, which is an aggressive and fatal disease. There are likely hundreds of thousands of MPN patients living in the US at this time, and though this disease is often indolent, there is no cure. MPNs are characterized by several recurrent mutations, including CALR and JAK2, but have no other clear molecular markers distinguishing them from the normal blood cells with which they are intermingled. Therefore, the clinical need to better understand MPNs and be able to distinguish them from healthy cells renders them great candidates to test GoT.

The team first tested the quality of GoT and showed that it is able to distinguish human cells containing a wildtype (normal) CALR gene from mouse cells containing a mutant CALR gene, just by assaying for LR. They verified the accuracy of GoT by replicating previous findings that mutant CALR is present in stem cells and progenitors of all blood cell lineages of MPN patients but that mRNA sequencing alone cannot distinguish wildtype from mutant cells. Dr. Landau’s team also showed that GoT is far superior to the unmodified platform.

Using GoT to Evaluate Patient Stem Cells

Given that ET is characterized by the megakaryocytes’ overproduction of platelets, GoT confirmed that while CALR mutations are present in both megakaryocyte progenitors and hematopoietic stem cells (HSCs), these mutations are elevated in the progenitor cells. Adding nuance to this existing notion, GoT also demonstrated that there are more CALR-mutant cells in more mature progenitor cell populations than in less mature progenitor cell populations, potentially reflecting the fact that ET is a disease of more mature blood cells. This also suggests that the CALR mutation provides a fitness benefit to more mature cells, allowing cells with this mutation to achieve a greater proportion of the overall progenitor cell population.

In addition, GoT analysis shows that, as with ET, patients with MF have a distribution of mutant CALR in HSCs and committed progenitors. However, in contrast to ET, MF showed elevated levels of CALR mutations in both HSCs and committed progenitors, suggesting that this mutation offers a strong fitness advantage to all of these types of cells in MF patients. This shows that the context of the CALR mutation is critical and demonstrates the importance of studying individual cells in all forms of CALR mutant MPNs.

Elucidating the Fitness Impact of the CALR Mutation

To directly investigate how the CALR mutation correlates to cellular fitness, Dr. Landau’s group assayed for cell cycle genes that are involved in cell proliferation, reasoning that higher expression of these genes correlates with greater proliferation (and thus greater fitness). Indeed, they found that the precursor cells that give rise to CALR-mutant megakaryocytes have higher levels of cell cycle gene expression than wildtype cells within the same patient. Increase in platelets is one clinical measure used to diagnose ET patients and monitor their clinical course. These patients have increased platelets resulting from more proliferation of megakaryocytes. Dr. Landau’s team showed that there is a correlation between the degree of cell cycle gene expression in CALR mutant megakaryocytes and the number of platelets found in the patient, directly correlating GoT data with a standard measure of disease severity in ET patients. These results show that CALR provides a fitness benefit to megakaryocyte precursor cells, consistent with the characteristics of ET.

Using GoT, Dr. Landau’s lab was also able to identify genes associated with CALR-mutant cells that may shape these cells’ particular characteristics. GoT is able to identify differentially expressed genes – a key attribute of modern sequencing approaches, since it helps investigators characterize the differences between cancer cells and healthy cells. GoT does this especially well because the genotyped mutant and wildtype cells whose transcriptional profiles it examines come from the same sample of a particular patient. Therefore, these cells share the same environment and other patient-specific characteristics, eliminating some of the variability inherent in other techniques. As a result of this technical advantage, Dr. Landau’s group was able to identify a number of genes that are differentially expressed between CALR-mutant and wildtype megakaryocytes, including several genes involved with the upregulation of the unfolded protein response (UPR). Previously studies by other groups had shown that UPR is upregulated in mutant CALR cells, but Landau’s lab was the first to identify a more complete picture of the specific genes within these mutated cells that are involved with this process.

UPR is one of several mechanisms that the body uses to fix or eliminate stressed cells to prevent them from becoming mutated and causing problems. The UPR in otherwise healthy HSCs signal stressed HSCs to undergo apoptosis (programmed cell death) to prevent long-term propagation of cellular abnormalities, while more mature progenitor cells are not so constrained and use the UPR to promote survival through the proteins IRE1/XBP1. Dr. Landau’s studies demonstrate that CALR mutant cells, whether they are HSCs or committed progenitors, have the IRE/XBP1 branch of the UPR enhanced, suggesting that CALR causes enhanced survival of both cell types. This means that the CALR mutation acts, in part through the UPR, to extend the survival of abnormal HSCs to the detriment of the patient.

Opening the Door to Multiple-Gene Analysis within the Same Genotyped Cell

In addition to CALR, Dr. Landau’s group also used GoT to analyze NFE2 and SF3B1, which are known to be mutant in MF. Prior subclonal analysis approaches did not allow for both the genotyping and transcriptional analysis of multiple genes in the same cell. Dr. Landau’s team measured the frequencies of NFE2 and SF3B1 respectively and the subclonal architecture of an MF patient and found results consistent with those seen in bulk sequencing and single-cell analysis, demonstrating that genotyping by GoT is at least as accurate those other techniques. However, GoT can elaborate on these findings with a further analysis of transcription within the same cell. For instance, GoT transcriptional analysis shows that SF3B1/CALR double mutants have a proliferative advantage over SF3B1 mutants, while the addition of NFE2 did not further increase the proliferative advantage. This exemplifies GoT’s enhancement of data available from prior techniques, interrogating the subclonal architecture of MF while simultaneously examining the transcriptional contributions of various genes in the genotyped cells.

Adaption of GoT to Address Its Challenges

One limitation of GoT is that its accuracy depends on the distance from the mutation to the end of the transcript. GoT is less efficient the further from the ends a mutation lies. Dr. Landau’s group modified GoT to include circularization and inverse polymerase chain reaction (PCR) steps to remove extra sequences between the mutation and the barcode used to identify each mRNA. This effectively reduces the distance between the mutation and the end of the transcript, making the GoT assay more efficient. They demonstrated the effectiveness of this modified GoT by analyzing the MPN oncogenic driver JAK2 V617F, in which the mutation is distant from the transcript end. They confirmed that JAK2 V617F is higher in megakaryocytic precursors compared to more immature stem cells as well as erythroid progenitors in ET patients. This is consistent with the known attributes of ET, demonstrating the usefulness of GoT to appropriately assay for genes whose mutations are far from the transcript ends, in addition to those whose mutations are near.

GoT’s Impact on Understanding Cellular Identity

Dr. Landau’s elegant research demonstrates the utility of linking single-cell genotyping and single-cell gene expression analysis in the same cell using GoT. Using MPN as a model to test GoT, the group showed that mutations differentially affect cells depending on their differentiation state as well as their lineage. They further fine-tuned existing knowledge of MPNs by showing fitness advantages of these mutations in the blood cell lineages that correspond to the clinical attributes of the disease. This study also refines our knowledge of the UPR in MPNs and identifies components of this pathway as possible therapeutic targets. GoT therefore is an important technological breakthrough that will assist researchers who wish to better understand mutations, gene expression, clonal evolution, and how these integrate with cellular identities during the pathogenesis of their disease of interest.

Dr. Landau is supported by The Leukemia & Lymphoma Society through a Translational Research Program (TRP) grant.


References

Nam AS, Kim K-T, Chaligne R, et al. Genotyping of Transcriptomes links somatic mutations and cell identity. Nature. Published online July 3, 2019.

Stuart T, Satija R. Integrative single-cell analysis. Nat Rev Genet. 2019; 20(5):257-272.

Nanglia J, Green AR. Myeloproliferative neoplasms: from origins to outcomes. Blood. 2017; 130(23):2475-2483.

Merlinsky TR, Levine RL, Pronier E. Unfolding the role of calreticulin in myeloproliferative neoplasm pathogenesis. Clin Cancer Res. 2019; 25(10):2956-2962.

van Galen P, Kreso A, Mbong N, et al. The unfolded protein response governs integrity of the haematopoietic stem-cell pool during stress. Nature. 2014; 510(7504):268-272.