Load existing Seurat object

load("~/seurat_files/nbt_tutorial_part1.Robj")

Pulling data from a Seurat object

# First, we introduce the fetch.data function, a very useful way to pull information from the dataset.
# Essentially it is a wrapper to pull from nbt@data, nbt@ident, nbt@pca.rot, nbt@data.info, etc...

# Pulls the identity class (cluster ID), PC1 scores, # of genes, original identity class (parsed from the cell name), gene expression levels for SOX1 and ACTB
# Info is pulled for all cells, but displayed for the last 5
# Note that the GW16 neurons fall into three clusters (1-3), with variable expression of PAX6 and DLX2 (see below for more)
my.data=fetch.data(nbt,c("ident","PC1","nGene","orig.ident","PAX6","DLX2","ACTB"))
tail(my.data,5)
##            ident        PC1 nGene orig.ident     PAX6     DLX2     ACTB
## Hi_GW16_22     3 0.03747778  3519       GW16 4.592996 0.000000 9.007845
## Hi_GW16_23     1 0.09227503  2670       GW16 0.000000 3.355502 9.503926
## Hi_GW16_24     2 0.06869470  2938       GW16 4.699935 0.000000 9.351865
## Hi_GW16_25     3 0.06927854  3863       GW16 4.089500 0.000000 9.407497
## Hi_GW16_26     1 0.10876099  3084       GW16 0.000000 6.492603 9.023716
# Can use the which.cells function to pull the names of cells that were experimentally marked as iPS cells, and find that they have been placed in cluster 7
ips.cells=which.cells(nbt,"iPS",id = "orig.ident")
my.data=fetch.data(nbt,"ident",cells.use = ips.cells)
head(my.data,5)
##          ident
## Hi_iPS_1     7
## Hi_iPS_2     7
## Hi_iPS_3     7
## Hi_iPS_4     7
## Hi_iPS_5     7
# You can also switch easily switch the cell's identity (for example, going back to the original annotation)
nbt = set.all.ident(nbt, "orig.ident")
# And switch back - to the cluster ID defined by the tree building
nbt = set.all.ident(nbt, "tree.ident")
# Also see set.ident for changing identity classes for only a subset of cells

Finding differentially expressed genes (cluster biomarkers)

#find all markers of cluster 8
#thresh.use speeds things up (increase value to increase speed) by only testing genes whose average expression is > thresh.use between cluster
#Note that Seurat finds both positive and negative markers (avg_diff either >0 or <0)
ips.markers=find.markers(nbt,7,thresh.use = 2)
print(head(ips.markers,5))
##           p_val  avg_diff
## TDGF1         0  7.225206
## ESRG          0  5.248835
## CRABP1        0  5.037518
## D21S2088E     0  4.737998
## FOXG1         0 -4.549359
# note that Seurat has four tests for differential expression:
# ROC test ("roc"), t-test ("t"), LRT test based on zero-inflated data ("bimod", default), LRT test based on tobit-censoring models ("tobit")
# The ROC test returns the 'classification power' for any individual marker (ranging from 0 - random, to 1 - perfect). Though not a statistical test, it is often very useful for finding clean markers.
ips.markers=find.markers(nbt,7,thresh.use = 2,test.use = "roc")

#Visualize these new markers with a violin plot
vlnPlot(nbt,c("CRABP1","LINC-ROR"))

#plots a correlation analysis of gene/gene (ie. 'FACS' plot - cells colored by cluster number)
genePlot(nbt,"CRABP1","LINC-ROR")

# Neuronal cells in the dataset (GW represents gestational week) cluster into three groups (1-3) on the phylogenetic tree, let's explore these grouos
plotClusterTree(nbt)