Analyzing the Presence of Glioblastoma Stem Cell Markers in the Tumor Microenvironment Using R
Cancer is a pernicious disease and the second leading cause of death in the United States (Mortality in the United States). Cancer can arise due to a variety of reasons that are genetic or environmental in nature and can vary on a patient-specific basis. Generally, genetic mutations that arise in abnormal cells results in uncontrolled growth of these cells that infiltrate and destroy normal tissue. Cancer can originate in various organs throughout the body and from the organ or tissue of origin, spread throughout the body (metastasis). Given the multiplicity of tissue/organs susceptible to cancer, researchers/clinicians consider cancer as a group of diseases.
One example of a particular type of cancer is Glioblastoma (GBM). This type of cancer occurs in the brain and is the most lethal, primary, i.e., original or initial brain tumor in adults. Unfortunately, patients diagnosed with GBM have a median survival time of 12-15 months and a median 5-year survival rate of only 10% (Epidemiology and Outcome of Glioblastoma). Unfortunately, patients diagnosed with GBM have a median survival time of 12-15 months and a median 5-year survival rate of only 10%. To treat such a lethal disease, the current standard-of-care (SOC) involves maximal surgical resection (physically removing the tumor mass from the brain), chemotherapy, and radiation therapy (radiotherapy). However, patients that receive SOC still face an approximate 90% chance of tumor recurrence. A major reason for the dismal prognosis is the differences that exist across individual tumor cells, ranging from the genetic to cellular level. Such variation and distinct differences that exist across tumor cells also manifest in how different subpopulation of tumor cells respond to drug treatment. The varying characteristics, or phenotypic differences that comprise a heterogeneous tumor cell population, that is intratumoral heterogeneity, make it quite difficult to find a single target that would kill the tumor cells effectively.
To address the challenge of intratumoral heterogeneity, significant efforts have been devoted to characterizing tumor cell heterogeneity both within and across tumors – identifying similarities and differences across tumor cells based on genetic profiles, gene expression, protein abundances, and metabolic behavior. One way to study differences among GBM tumor cells is to apply some perturbation, i.e., a disturbance, external force, or signal to the cell population, and assess how similar or different the responses are across the cell population. In this exercise, we will examine some of the differences that arise across GBM stem cell (GSC) subpopulations have been treated with a drug (or vehicle, which in this case is the solvent in which the drug is dissolved) for 4 days. scRNA-seq profiles have been measured from a GSC population over a 4-day treatment (Figure 1). The goal of this exercise is for you to get a taste of the type of analysis performed to understand underlying differences in gene expression driving drug response, which can inform on potential targets distinguishing tumor cell subpopulations.
Figure 1. Experimental design from which scRNA-seq data was generated, a subset of which you will analyze in this exercise. Grey arrows indicate untreated conditions. Colored arrows indicate the timing and duration of pitavastatin (pink) and vehicle (DMSO, blue) treatment.
Open a digital document to take detailed notes on the information presented below. Use your resources (Google, research papers, etc) to the best of your ability.
R PREWORK:
Download RStudio; This is what we will be coding in.
Load the (SN520_data4DEG) .csv file into RStudio using the read.csv() function
Put this data into a singular variable
Since this data was made for Excel, we need to split the data into two DataFrames to go along with activity 2
DataFrame1 = matrixdf
This is the gene names (row) plus the cell ID (columns)
DataFrame2 = metadf
This is the first 3 rows. We will use this as a key to refer back to the original cell id’s
Delete the first row of the metadf so it doesn’t repeat itself
Do the same thing for the rows and columns of matrixdf
Assign the random variable to rowvalues(matrixdf)
Use the apply() function to change matrixdf into a numeric value
Assign genenames (or whatever variable you put) to rownames(matrixdf)
This reassigns the row names of matrixdf after the apply function
Convert matrixdf into a data frame using as.data.frame()
Activity 1: Understanding GBM stem-like cells
Multiple schools of thought exist about the cause(s) of tumor heterogeneity and progression. One such theory, the cancer stem cell theory, proposes that among a heterogeneous tumor cell population, a subpopulation of cells exist that exhibit stem cell properties, including self-renewal, differentiation, and de-differentiation. Experiments and clinical data have shown that cancer stem cells resist chemotherapy, can form a tumor (tumorigenesis), and thus are believed to be a major driver for tumor recurrence (Figure 2). Simultaneously, separate data strongly indicates that cancer stem cells are highly plastic, meaning they can transition back and forth from one state to another (Figure 3). From the two figures below, what are some questions or thoughts that you may have regarding GSCs?
Figure 2. Cancer stem cell developmental hierarchy. In the context of GBM, GSCs sit atop a developmental hierarchy in which GSCs can self-renew and differentiate into tumor cells.
Figure 2. Cancer stem cell plasticity. Evidence strongly indicates that cancer stem cells (CSCs) are plastic, i.e., they can transition from one cell state into another. Such plasticity occurs in response to internal and external signaling cues and pressures. Not only can CSCs differentiate into differentiated tumor cells, but tumor cells can also differentiate back into a stem-like cell state. Moreover, CSCs can transition from an epithelial to a mesenchymal state, which is associated with greater drug resistance and migration. In GSCs, a similar phenomenon occurs where GSCs have been shown to transition from a proneural molecular subtype to a mesenchymal subtype.
Given the nature of GBM, it is difficult to perform experiments directly on the tumor, or in other words, the patient. Consequently, a variety of model systems have been developed that allow us to characterize various aspects of GBM tumor cells or GSCs in different contexts, ranging from in vitro cell cultures to in vivo mouse models in which a small sample of a tumor is implanted in a particular type of mouse (immunodeficient mice – why are immunodeficient mice needed to create a tumor xenograft models?). Each type of model has advantages and disadvantages, so it is important to keep in mind what those are when devising a question/hypothesis, designing experiments, and developing the conclusion(s) from these experiments. Here in the Baliga Lab at ISB, we use GSCs as a model of phenotypic heterogeneity.
Activity 2: Understanding scRNA-seq data
Because cell-to-cell heterogeneity that pervades GBM tumors, scRNA-seq data is used to characterize tumors. Here you will be working with a small subset of scRNA-seq data to get a sense of what type of data is used in part to characterize actual tumor cells, whether they are taken directly from a tumor biopsy or a tumor model.
Questions:
How many rows are in matrixdf?
How many columns?
What do you notice about this DataFrame?
Activity 3: Visualizing Stem Cell markers
To visualize the gene heterogeneity of each cell in relation to each other, we will create a heatmap. Making a heatmap will allow us to see the differences in how much each gene is presented within each cell. Here are some examples of heat maps to help you visualize what you will be creating.
Heatmap 1: Comparing different car models to each other
Heatmap 2: Comparing different search results to each other
Using $, assign the row names of matrixdf to every gene in matrixdf
Now we need to normalize our data using z scores
Make sure to delete the last column of matrixdf as that is the gene names again
Finally, run your ggplot function to create the heatmap!
Extra: I changed my matrixdf to include a smaller subset of data so it would be easier to look at and analyze!
Make sure to either open a new R script before doing this or go back and rewrite the previous script.
Most of these genes are “white” in my case, or in other words, close to/equal to zero. Why do you think that is?
Activity 4: Analyzing Stem Cell Markers
Multiple genes have been associated with cancer stem cells that act as stem-cell markers, including STAT3, CEBPB, CD44, VIM, NES, OLIG2, SOX2, FOSL2, and S100A4. It is important to note that no particular gene or even one set of genes can definitively determine whether a cell is a stem cell. We use multiple genes to increase our confidence in the stemness of the cells (the higher the expression of multiple stem cell markers, the greater the likelihood that the cell or sample is a stem-like cell).
Create boxplots that show the expression of each stem cell marker in every cell. (Pro-tip: Instead of using a row/column name when calling a variable, you can use the name of the row/column in quotes)
These stem cell markers can also be used in other pathways and for other functions that affect how the GSC grows. Using biomaRT, we will look at the differences between each gene function.
The R console will print out subsequent things to download that will allow BiomaRt run
Install 2 packages (“magrittr” and “tidyverse”)
Call upon the 3 libraries that you have downloaded (biomaRt, tidyverse, and magrittr)
Follow instructions 1-4 on this website (use the hsapiens_gene_dataset in useMart())
In the getBM function, assign the attributes of external_gene_name and definition_1006 (use c())
These attributes let the computer know what attributes the subsequent value has
If we are trying to input the specific gene name, what filter do you think we would use?
Value = the name of the gene
To what object did we assign the mart?
Input the 10 different stem cell markers that you are given above.
What do you notice? List some attributes of each stem cell marker
At the end, your code should look like this:
This is what my code looks like
This analysis represents some of the preliminary types of analyses that are performed on single-cell data from various tumor models to help us characterize and better understand tumor cell models. Hopefully, this exercise gives you a sense of one type of data and what sort of questions that should be asked to improve our understanding of GBM and cancers in general.
Now that we have briefly looked at differences that exist between tumor cells. What do you think are some other comparisons that can be made? What sort of additional information may be required to perform more detailed analyses?
Students, please take this 1-minute survey, now that you've completed this activity. We are interested in learning about your experience so we can improve these resources. All responses to this survey are anonymous, all questions are optional, and your feedback is much appreciated.
Analyzing the Presence of Glioblastoma Stem Cell Markers in the Tumor Microenvironment Using R
Contributors
James Park, Ph.D.
Research Scientist, ISB
Layla Ismail
High School Intern, ISB & Cleveland STEM High School
Claudia Ludwig, M.Ed.
Director of Systems Education Experiences, ISB
Special thanks to Kristian Swearingen and Rachel Calder for the Malaria excel activity which this was modeled after.
Funding to support the development of this activity and experience was provided by the National Institutes of Health, awards 1U54CA274509 & F32CA247445. The content of these pages was created by students for students with the help of scientists and teachers. The views expressed herein are those of the authors and do not necessarily reflect the views of NIH or ISB.