Seurat - Dimensional Reduction Vignette
Compiled: October 12, 2017
Load in the data
This vignette demonstrates how to store and interact with dimensional reduction information (such as the output from
RunPCA) in Seurat v2.0. For demonstration purposes, we will be using the 2,700 PBMC object that is created in the first guided tutorial. You can download the pre-computed object here.
library(Seurat) load(file = "~/Projects/datasets/pbmc3k_final.Rda")
Explore the new dimensional reduction structure
In Seurat v2.0, storing and interacting with dimensional reduction information has been generalized and clarified. Each dimensional reduction procedure is stored in the
object@dr slot, as an element of a named list. For example, after running a principle component analysis with
object@dr$pca will contain the results of the PCA. By adding new elements to the list, users can add additional, and custom, dimensional reductions. Each stored dimensional reduction contains the following slots:
- cell.embeddings: stores the coordinates for each cell in low-dimensional space.
- gene.loadings: stores the weight for each gene along each dimension of the embedding
- gene.loadings.full: Seurat typically calculate the dimensional reduction on a subset of genes (for example, high-variance genes), and then project that structure onto the entire dataset (all genes). The results of that projection (calculated with ProjectDim) are stored in this slot. Note that the cell loadings will remain unchanged after projection but there are now gene loadings for all genes.
- sdev: The standard deviations of each dimension. Most often used with PCA (storing the square roots of the eigenvalues of the covariance matrix) and can be useful when looking at the drop off in the amount of variance that is explained by each successive dimension.
- key: Sets the column names for the cell.embeddings and gene.loadings matrices. For example, for PCA, the column names are PC1, PC2, etc., so the key is “PC”.
- jackstraw: Stores the results of the jackstraw procedure run using this dimensional reduction technique. Currently supported only for PCA.
- misc: Bonus slot to store any other information you might want
To access these slots, we provide the following functions:
head(x = GetCellEmbeddings(object = pbmc, reduction.type = "pca", dims.use = 1:5))
## PC1 PC2 PC3 PC4 PC5 ## AAACATACAACCAC 5.569384 -0.2601651 0.07208744 -2.8327852 0.05418562 ## AAACATTGAGCTAC 7.216456 -7.4833577 -0.27232060 7.9844621 -2.99848095 ## AAACATTGATCAGC 2.706629 1.5814099 0.54774967 -2.2393826 1.79540785 ## AAACCGTGCTTCCG -10.134042 1.3678993 -1.32082569 0.6694909 2.91299684 ## AAACCGTGTATGCG -1.099311 8.1505284 -1.42908065 4.2245213 -1.96205909 ## AAACGCACTGGTAC 1.455335 -1.9453261 -0.83467716 -2.3035974 -1.73324749
head(x = GetGeneLoadings(object = pbmc, reduction.type = "pca", dims.use = 1:5))
## PC1 PC2 PC3 PC4 ## TNFRSF4 0.026010991 0.003256709 0.0018341968 -0.036271503 ## CPSF3L 0.008282783 0.009079823 -0.0007640280 0.008883533 ## ATAD3C 0.003307989 0.003211707 0.0004175542 -0.001693306 ## C1orf86 -0.010653986 -0.000264935 -0.0070123176 0.002372739 ## RER1 -0.013715242 0.027391985 -0.0107780157 0.006205270 ## TNFRSF25 0.026637414 0.010852550 0.0024332223 -0.033597921 ## PC5 ## TNFRSF4 0.0168098228 ## CPSF3L -0.0063828472 ## ATAD3C -0.0003844082 ## C1orf86 -0.0038661887 ## RER1 0.0182815126 ## TNFRSF25 0.0205302367
# We also provide shortcut functions for common dimensional reduction # techniques like PCA PCAEmbed and PCALoad() will pull the PCA cell # embeddings and gene loadings respectively head(x = GetDimReduction(object = pbmc, reduction.type = "pca", slot = "sdev"))
##  5.666584 4.326466 3.952192 3.638124 2.191529 1.996551
RunTSNE (tsne), and
RunDiffusionMap (dmap), representing dimensional reduction techniques commonly applied to scRNA-seq data. When using these functions, all slots are filled automatically.
We also allow users to add the results of a custom dimensional reduction technique (for example, multi-dimensional scaling (MDS), or zero-inflated factor analysis), that is computed separately. All you need is a matrix with each cell’s coordinates in low-dimensional space, as shown below.
Storing a new dimensional reduction calculation
Though not incorporated as part of the Seurat package, its easy to run multidimensional scaling (MDS) in R. If you were interested in running MDS and storing the output in your Seurat object:
# Before running MDS, we first calculate a distance matrix between all pairs # of cells. Here we use a simple euclidean distance metric on all genes, # using email@example.com as input d <- dist(x = t(x = firstname.lastname@example.org)) # Run the MDS procedure, k determines the number of dimensions mds <- cmdscale(d = d, k = 2) # cmdscale returns the cell embeddings, we first label the columns to ensure # downstream consistency colnames(x = mds) <- paste0("MDS", 1:2) # We will now store this as a new dimensional reduction called 'mds' pbmc <- SetDimReduction(object = pbmc, reduction.type = "mds", slot = "cell.embeddings", new.data = mds) pbmc <- SetDimReduction(object = pbmc, reduction.type = "mds", slot = "key", new.data = "MDS") # We can now use this as you would any other dimensional reduction in all # downstream functions (similar to PCAPlot, but generalized for any # reduction) DimPlot(object = pbmc, reduction.use = "mds", pt.size = 0.5)