Load existing Seurat object
load("~/seurat_files/nbt_tutorial_part1.Robj")
Pulling data from a Seurat object
# First, we introduce the fetch.data function, a very useful way to pull information from the dataset.
# Essentially it is a wrapper to pull from nbt@data, nbt@ident, nbt@pca.rot, nbt@data.info, etc...
# Pulls the identity class (cluster ID), PC1 scores, # of genes, original identity class (parsed from the cell name), gene expression levels for SOX1 and ACTB
# Info is pulled for all cells, but displayed for the last 5
# Note that the GW16 neurons fall into three clusters (1-3), with variable expression of PAX6 and DLX2 (see below for more)
my.data=fetch.data(nbt,c("ident","PC1","nGene","orig.ident","PAX6","DLX2","ACTB"))
tail(my.data,5)
## ident PC1 nGene orig.ident PAX6 DLX2 ACTB
## Hi_GW16_22 3 0.03747778 3519 GW16 4.592996 0.000000 9.007845
## Hi_GW16_23 1 0.09227503 2670 GW16 0.000000 3.355502 9.503926
## Hi_GW16_24 2 0.06869470 2938 GW16 4.699935 0.000000 9.351865
## Hi_GW16_25 3 0.06927854 3863 GW16 4.089500 0.000000 9.407497
## Hi_GW16_26 1 0.10876099 3084 GW16 0.000000 6.492603 9.023716
# Can use the which.cells function to pull the names of cells that were experimentally marked as iPS cells, and find that they have been placed in cluster 7
ips.cells=which.cells(nbt,"iPS",id = "orig.ident")
my.data=fetch.data(nbt,"ident",cells.use = ips.cells)
head(my.data,5)
## ident
## Hi_iPS_1 7
## Hi_iPS_2 7
## Hi_iPS_3 7
## Hi_iPS_4 7
## Hi_iPS_5 7
# You can also switch easily switch the cell's identity (for example, going back to the original annotation)
nbt = set.all.ident(nbt, "orig.ident")
# And switch back - to the cluster ID defined by the tree building
nbt = set.all.ident(nbt, "tree.ident")
# Also see set.ident for changing identity classes for only a subset of cells
Finding differentially expressed genes (cluster biomarkers)
#find all markers of cluster 8
#thresh.use speeds things up (increase value to increase speed) by only testing genes whose average expression is > thresh.use between cluster
#Note that Seurat finds both positive and negative markers (avg_diff either >0 or <0)
ips.markers=find.markers(nbt,7,thresh.use = 2)
print(head(ips.markers,5))
## p_val avg_diff
## TDGF1 0 7.225206
## ESRG 0 5.248835
## CRABP1 0 5.037518
## D21S2088E 0 4.737998
## FOXG1 0 -4.549359
# note that Seurat has four tests for differential expression:
# ROC test ("roc"), t-test ("t"), LRT test based on zero-inflated data ("bimod", default), LRT test based on tobit-censoring models ("tobit")
# The ROC test returns the 'classification power' for any individual marker (ranging from 0 - random, to 1 - perfect). Though not a statistical test, it is often very useful for finding clean markers.
ips.markers=find.markers(nbt,7,thresh.use = 2,test.use = "roc")
#Visualize these new markers with a violin plot
vlnPlot(nbt,c("CRABP1","LINC-ROR"))
#plots a correlation analysis of gene/gene (ie. 'FACS' plot - cells colored by cluster number)
genePlot(nbt,"CRABP1","LINC-ROR")
# Neuronal cells in the dataset (GW represents gestational week) cluster into three groups (1-3) on the phylogenetic tree, let's explore these grouos
plotClusterTree(nbt)