This vignette demonstrates how to store and interact with dimensional reduction information (such as the output from RunPCA
) in Seurat v2.0. For demonstration purposes, we will be using the 2,700 PBMC object that is created in the first guided tutorial. You can download the pre-computed object here.
library(Seurat)
pbmc <- readRDS(file = "~/Projects/datasets/pbmc3k_final.rds")
In Seurat v2.0, storing and interacting with dimensional reduction information has been generalized and clarified. Each dimensional reduction procedure is stored in the object@dr
slot, as an element of a named list. For example, after running a principle component analysis with RunPCA
, object@dr$pca
will contain the results of the PCA. By adding new elements to the list, users can add additional, and custom, dimensional reductions. Each stored dimensional reduction contains the following slots:
To access these slots, we provide the following functions: GetCellEmbeddings
, GetGeneLoadings
, GetDimReduction
head(x = GetCellEmbeddings(object = pbmc, reduction.type = "pca", dims.use = 1:5))
## PC1 PC2 PC3 PC4 PC5
## AAACATACAACCAC 5.569384 -0.2601651 0.07208744 2.8327852 -0.05418562
## AAACATTGAGCTAC 7.216456 -7.4833577 -0.27232060 -7.9844621 2.99848095
## AAACATTGATCAGC 2.706629 1.5814099 0.54774967 2.2393826 -1.79540785
## AAACCGTGCTTCCG -10.134042 1.3678993 -1.32082569 -0.6694909 -2.91299684
## AAACCGTGTATGCG -1.099311 8.1505284 -1.42908065 -4.2245213 1.96205909
## AAACGCACTGGTAC 1.455335 -1.9453261 -0.83467716 2.3035974 1.73324749
head(x = GetGeneLoadings(object = pbmc, reduction.type = "pca", dims.use = 1:5))
## PC1 PC2 PC3 PC4
## TNFRSF4 0.026010991 0.003256709 0.0018341968 0.036271503
## CPSF3L 0.008282783 0.009079823 -0.0007640280 -0.008883533
## ATAD3C 0.003307989 0.003211707 0.0004175542 0.001693306
## C1orf86 -0.010653986 -0.000264935 -0.0070123176 -0.002372739
## RER1 -0.013715242 0.027391985 -0.0107780157 -0.006205270
## TNFRSF25 0.026637414 0.010852550 0.0024332223 0.033597921
## PC5
## TNFRSF4 -0.0168098228
## CPSF3L 0.0063828472
## ATAD3C 0.0003844082
## C1orf86 0.0038661887
## RER1 -0.0182815126
## TNFRSF25 -0.0205302367
# We also provide shortcut functions for common dimensional reduction
# techniques like PCA PCAEmbed and PCALoad() will pull the PCA cell
# embeddings and gene loadings respectively
head(x = GetDimReduction(object = pbmc, reduction.type = "pca", slot = "sdev"))
## [1] 5.666584 4.326466 3.952192 3.638124 2.191529 1.996551
Seurat provides RunPCA
(pca), RunICA
(ica), RunTSNE
(tsne), and RunDiffusionMap
(dmap), representing dimensional reduction techniques commonly applied to scRNA-seq data. When using these functions, all slots are filled automatically.
We also allow users to add the results of a custom dimensional reduction technique (for example, multi-dimensional scaling (MDS), or zero-inflated factor analysis), that is computed separately. All you need is a matrix with each cell’s coordinates in low-dimensional space, as shown below.
Though not incorporated as part of the Seurat package, its easy to run multidimensional scaling (MDS) in R. If you were interested in running MDS and storing the output in your Seurat object:
# Before running MDS, we first calculate a distance matrix between all pairs
# of cells. Here we use a simple euclidean distance metric on all genes,
# using object@scale.data as input
d <- dist(x = t(x = pbmc@scale.data))
# Run the MDS procedure, k determines the number of dimensions
mds <- cmdscale(d = d, k = 2)
# cmdscale returns the cell embeddings, we first label the columns to ensure
# downstream consistency
colnames(x = mds) <- paste0("MDS", 1:2)
# We will now store this as a new dimensional reduction called 'mds'
pbmc <- SetDimReduction(object = pbmc, reduction.type = "mds", slot = "cell.embeddings",
new.data = mds)
pbmc <- SetDimReduction(object = pbmc, reduction.type = "mds", slot = "key",
new.data = "MDS")
# We can now use this as you would any other dimensional reduction in all
# downstream functions (similar to PCAPlot, but generalized for any
# reduction)
DimPlot(object = pbmc, reduction.use = "mds", pt.size = 0.5)
# If you wold like to observe genes that are strongly correlated with the
# first MDS coordinate (similar to ProjectPCA, but generalized for any
# reduction):
pbmc <- ProjectDim(object = pbmc, reduction.type = "mds")
## [1] "MDS1"
## [1] "RPS27A" "MALAT1" "RPS27" "RPL23A" "RPSA" "RPS3A" "RPL3"
## [8] "RPL9" "RPS6" "RPL21" "RPS3" "LTB" "RPL13A" "RPS15A"
## [15] "RPS12" "RPS25" "RPS18" "CD3D" "RPL31" "RPL30" "PTPRCAP"
## [22] "LDHB" "IL32" "RPS23" "RPLP2" "RPL27A" "RPS29" "RPL13"
## [29] "EEF1A1" "CD3E"
## [1] ""
## [1] "CST3" "TYROBP" "FTL" "FCER1G" "LST1" "S100A9"
## [7] "FTH1" "AIF1" "FCN1" "LYZ" "TYMP" "LGALS1"
## [13] "CFD" "S100A8" "CD68" "LGALS2" "CTSS" "SERPINA1"
## [19] "SAT1" "IFITM3" "SPI1" "PSAP" "IFI30" "CFP"
## [25] "S100A11" "NPC2" "COTL1" "GRN" "GSTP1" "GPX1"
## [1] ""
## [1] ""
## [1] "MDS2"
## [1] "NKG7" "CST7" "GZMB" "PRF1" "GZMA" "FGFBP2" "GNLY"
## [8] "CTSW" "B2M" "CCL5" "SPON2" "GZMH" "CCL4" "HLA-C"
## [15] "KLRD1" "FCGR3A" "CLIC3" "GZMM" "CD247" "XCL2" "AKR1C3"
## [22] "TTC38" "S1PR5" "HOPX" "HLA-A" "PRSS23" "APMAP" "GPR56"
## [29] "TPST2" "MATK"
## [1] ""
## [1] "RPL32" "RPL18A" "RPL11" "RPL13" "RPS2" "RPL12"
## [7] "RPL28" "RPL10" "RPS9" "RPL8" "RPL13A" "RPS18"
## [13] "RPLP1" "HLA-DRA" "RPLP2" "RPL29" "RPS15" "RPS12"
## [19] "CD79A" "RPS16" "RPS14" "RPL19" "RPS11" "RPS23"
## [25] "RPS5" "RPS28" "RPS27" "FAU" "HLA-DQA1" "HLA-DQB1"
## [1] ""
## [1] ""
# Display the results as a heatmap (similar to PCHeatmap, but generalized
# for any dimensional reduction)
DimHeatmap(object = pbmc, reduction.type = "mds", dim.use = 1, cells.use = 500,
use.full = TRUE, do.balanced = TRUE, label.columns = FALSE, remove.key = TRUE)
# Explore how the first MDS dimension is distributed across clusters
VlnPlot(object = pbmc, features.plot = "MDS1", x.lab.rot = TRUE)
# See how the first MDS dimension is correlated with the first PC dimension
GenePlot(object = pbmc, gene1 = "MDS1", gene2 = "PC1")
In Seurat v2.0, we have switched all PCA calculations to be performed via the irlba package to enable calculation of partial PCAs (i.e. only calculate the first X PCs). While this is an approximate algorithm, it performs remarkably similar to running a full PCA and has significant savings in terms of computation time and resources. These savings become necessary when running Seurat on increasingly large datasets. We also allow the user to decide whether to weight the PCs by the percent of the variance they explain (the weight.by.var parameter). For large datasets containing rare cell types, we often see improved results by setting this to FALSE
, as this prevents the initial PCs (which often explain a disproportionate amount of variance) from masking rare cell types or subtle sources of heterogeneity that appear in later PCs.