This tutorial demonstrates how to use Seurat (>=3.2) to analyze spatially-resolved RNA-seq data. While the analytical pipelines are similar to the Seurat workflow for single-cell RNA-seq analysis, we introduce updated interaction and visualization tools, with a particular emphasis on the integration of spatial and molecular information. This tutorial will cover the following tasks, which we believe will be common for many spatial analyses:
For our first vignette, we analyze a dataset generated with the Visium technology from 10x Genomics. We will be extending Seurat to work with additional data types in the near-future, including SLIDE-Seq, STARmap, and MERFISH.
First, we load Seurat and the other packages necessary for this vignette.
Here, we will be using a recently released dataset of sagital mouse brain slices generated using the Visium v1 chemistry. There are two serial anterior sections, and two (matched) serial posterior sections.
You can download the data here, and load it into Seurat using the
Load10X_Spatial() function. This reads in the output of the spaceranger pipeline, and returns a Seurat object that contains both the spot-level expression data along with the associated image of the tissue slice. You can also use our SeuratData package for easy data access, as demonstrated below. After installing the dataset, you can type
?stxBrain to learn more.
brain <- LoadData("stxBrain", type = "anterior1")
How is the spatial data stored within Seurat? The visium data from 10x consists of the following data types:
Assaybut contains spot level, not single-cell level data. The image itself is stored in a new
imagesslot in the Seurat object. The
imagesslot also stores the information necessary to associate spots with their physical position on the tissue image.
The initial preprocessing steps that we perform on the spot by gene expression data are similar to a typical scRNA-seq experiment. We first need to normalize the data in order to account for variance in sequencing depth across data points. We note that the variance in molecular counts / spot can be substantial for spatial datasets, particularly if there are differences in cell density across the tissue. We see substantial heterogeneity here, which requires effective normalization.
plot1 <- VlnPlot(brain, features = "nCount_Spatial", pt.size = 0.1) + NoLegend() plot2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right") wrap_plots(plot1, plot2)
These plots demonstrate that the variance in molecular counts across spots is not just technical in nature, but also is dependent on the tissue anatomy. For example, regions of the tissue that are depleted for neurons (such as the cortical white matter), reproducibly exhibit lower molecular counts. As a result, standard approaches (such as the
LogNormalize() function), which force each data point to have the same underlying ‘size’ after normalization, can be problematic.
As an alternative, we recommend using sctransform (Hafemeister and Satija, Genome Biology 2019), which which builds regularized negative binomial models of gene expression in order to account for technical artifacts while preserving biological variance. For more details on sctransform, please see the paper here and the Seurat vignette here. sctransform normalizes the data, detects high-variance features, and stores the data in the
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
How do results compare to log-normalization? To explore the differences in normalization methods, we examine how both the sctransform and log normalization results correlate with the number of UMIs. For this comparison, we first rerun sctransform to store values for all genes and run a log-normalization procedure via
# rerun normalization to store sctransform residuals for all genes brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE) # also run standard log normalization for comparison brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")
# Computes the correlation of the log normalized data and sctransform residuals with the # number of UMIs brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE) brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)
p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") + theme(plot.title = element_text(hjust = 0.5)) p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") + theme(plot.title = element_text(hjust = 0.5)) p1 + p2
For the boxplots above, we calculate the correlation of each feature (gene) with the number of UMIs (the
nCount_Spatialvariable here). We then place genes into groups based on their mean expression, and generate boxplots of these correlations. You can see that log-normalization fails to adequately normalize genes in the first three groups, suggesting that technical factors continue to influence normalized expression estimates for highly expressed genes. In contrast, sctransform normalization substantially mitigates this effect.
In Seurat, we have functionality to explore and interact with the inherently visual nature of spatial data. The
SpatialFeaturePlot() function in Seurat extends
FeaturePlot(), and can overlay molecular data on top of tissue histology. For example, in this data set of the mouse brain, the gene Hpca is a strong hippocampus marker and Ttr is a marker of the choroid plexus.
The default parameters in Seurat emphasize the visualization of molecular data. However, you can also adjust the size of the spots (and their transparency) to improve the visualization of the histology image, by changing the following parameters:
pt.size.factor- This will scale the size of the spots. Default is 1.6
alpha- minimum and maximum transparency. Default is c(1, 1).
alphac(0.1, 1), to downweight the transparency of points with lower expression
p1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1) p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1)) p1 + p2
We can then proceed to run dimensionality reduction and clustering on the RNA expression data, using the same workflow as we use for scRNA-seq analysis.
brain <- RunPCA(brain, assay = "SCT", verbose = FALSE) brain <- FindNeighbors(brain, reduction = "pca", dims = 1:30) brain <- FindClusters(brain, verbose = FALSE) brain <- RunUMAP(brain, reduction = "pca", dims = 1:30)
p1 <- DimPlot(brain, reduction = "umap", label = TRUE) p2 <- SpatialDimPlot(brain, label = TRUE, label.size = 3) p1 + p2
As there are many colors, it can be challenging to visualize which voxel belongs to which cluster. We have a few strategies to help with this. Setting the
label parameter places a colored box at the median of each cluster (see the plot above).
You can also use the
cells.highlight parameter to demarcate particular cells of interest on a
SpatialDimPlot(). This can be very useful for distinguishing the spatial localization of individual clusters, as we show below:
SpatialDimPlot(brain, cells.highlight = CellsByIdentities(object = brain, idents = c(2, 1, 4, 3, 5, 8)), facet.highlight = TRUE, ncol = 3)
We have also built in a number of interactive plotting capabilities. Both
SpatialFeaturePlot() now have an
interactive parameter, that when set to
TRUE, will open up the Rstudio viewer pane with an interactive Shiny plot. The example below demonstrates an interactive
SpatialDimPlot() in which you can hover over spots and view the cell name and current identity class (analogous to the previous
SpatialDimPlot(brain, interactive = TRUE)
SpatialFeaturePlot(), setting interactive to
TRUE brings up an interactive pane in which you can adjust the transparency of the spots, the point size, as well as the
Assay and feature being plotted. After exploring the data, selecting the done button will return the last active plot as a ggplot object.
SpatialFeaturePlot(brain, features = "Ttr", interactive = TRUE)
LinkedDimPlot() function links the UMAP representation to the tissue image representation and allows for interactive selection. For example, you can select a region in the UMAP plot and the corresponding spots in the image representation will be highlighted.
Seurat offers two workflows to identify molecular features that correlate with spatial location within a tissue. The first is to perform differential expression based on pre-annotated anatomical regions within the tissue, which may be determined either from unsupervised clustering or prior knowledge. This strategy works will in this case, as the clusters above exhibit clear spatial restriction.
de_markers <- FindMarkers(brain, ident.1 = 5, ident.2 = 6) SpatialFeaturePlot(object = brain, features = rownames(de_markers)[1:3], alpha = c(0.1, 1), ncol = 3)
An alternative approach, implemented in
FindSpatiallyVariables(), is to search for features exhibiting spatial patterning in the absence of pre-annotation. The default method (
method = 'markvariogram), is inspired by the Trendsceek, which models spatial transcriptomics data as a mark point process and computes a ‘variogram’, which identifies genes whose expression level is dependent on their spatial location. More specifically, this process calculates gamma(r) values measuring the dependence between two spots a certain “r” distance apart. By default, we use an r-value of ‘5’ in these analyses, and only compute these values for variable genes (where variation is calculated independently of spatial location) to save time.
We note that there are multiple methods in the literature to accomplish this task, including SpatialDE, and Splotch. We encourage interested users to explore these methods, and hope to add support for them in the near future.
brain <- FindSpatiallyVariableFeatures(brain, assay = "SCT", features = VariableFeatures(brain)[1:1000], selection.method = "markvariogram")
Now we visualize the expression of the top 6 features identified by this measure.
top.features <- head(SpatiallyVariableFeatures(brain, selection.method = "markvariogram"), 6) SpatialFeaturePlot(brain, features = top.features, ncol = 3, alpha = c(0.1, 1))
As with single-cell objects, you can subset the object to focus on a subset of data. Here, we approximately subset the frontal cortex. This process also facilitates the integration of these data with a cortical scRNA-seq dataset in the next section. First, we take a subset of clusters, and then further segment based on exact positions. After subsetting, we can visualize the cortical cells either on the full image, or a cropped image.
cortex <- subset(brain, idents = c(1, 2, 3, 4, 6, 7)) # now remove additional cells, use SpatialDimPlots to visualize what to remove # SpatialDimPlot(cortex,cells.highlight = WhichCells(cortex, expression = image_imagerow > 400 # | image_imagecol < 150)) cortex <- subset(cortex, anterior1_imagerow > 400 | anterior1_imagecol < 150, invert = TRUE) cortex <- subset(cortex, anterior1_imagerow > 275 & anterior1_imagecol > 370, invert = TRUE) cortex <- subset(cortex, anterior1_imagerow > 250 & anterior1_imagecol > 440, invert = TRUE)
p1 <- SpatialDimPlot(cortex, crop = TRUE, label = TRUE) p2 <- SpatialDimPlot(cortex, crop = FALSE, label = TRUE, pt.size.factor = 1, label.size = 3) p1 + p2
At ~50um, spots from the visium assay will encompass the expression profiles of multiple cells. For the growing list of systems where scRNA-seq data is available, users may be interested to ‘deconvolute’ each of the spatial voxels to predict the underlying composition of cell types. In preparing this vignette, we tested a wide variety of decovonlution and integration methods, using a reference scRNA-seq dataset of ~14,000 adult mouse cortical cell taxonomy from the Allen Institute, generated with the SMART-Seq2 protocol. We consistently found superior performance using integration methods (as opposed to deconvolution methods), likely because of substantially different noise models that characterize spatial and single-cell datasets, and integration methods are specifiically designed to be robust to these differences. We therefore apply the ‘anchor’-based integration workflow introduced in Seurat v3, that enables the probabilistic transfer of annotations from a reference to a query set. We therefore follow the label transfer workflow introduced here, taking advantage of sctransform normalization, but anticipate new methods to be developed to accomplish this task.
We first load the data (download available here), pre-process the scRNA-seq reference, and then perform label transfer. The procedure outputs, for each spot, a probabilistic classification for each of the scRNA-seq derived classes. We add these predictions as a new assay in the Seurat object.
allen_reference <- readRDS("../data/allen_cortex.rds")
# note that setting ncells=3000 normalizes the full dataset but learns noise models on 3k # cells this speeds up SCTransform dramatically with no loss in performance library(dplyr) allen_reference <- SCTransform(allen_reference, ncells = 3000, verbose = FALSE) %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:30)
# After subsetting, we renormalize cortex cortex <- SCTransform(cortex, assay = "Spatial", verbose = FALSE) %>% RunPCA(verbose = FALSE) # the annotation is stored in the 'subclass' column of object metadata DimPlot(allen_reference, group.by = "subclass", label = TRUE)
anchors <- FindTransferAnchors(reference = allen_reference, query = cortex, normalization.method = "SCT") predictions.assay <- TransferData(anchorset = anchors, refdata = allen_reference$subclass, prediction.assay = TRUE, weight.reduction = cortex[["pca"]], dims = 1:30) cortex[["predictions"]] <- predictions.assay
Now we get prediction scores for each spot for each class. Of particular interest in the frontal cortex region are the laminar excitatory neurons. Here we can distinguish between distinct sequential layers of these neuronal subtypes, for example:
DefaultAssay(cortex) <- "predictions" SpatialFeaturePlot(cortex, features = c("L2/3 IT", "L4"), pt.size.factor = 1.6, ncol = 2, crop = TRUE)
Based on these prediction scores, we can also predict cell types whose location is spatially restricted. We use the same methods based on marked point processes to define spatially variable features, but use the cell type prediction scores as the “marks” rather than gene expression.
cortex <- FindSpatiallyVariableFeatures(cortex, assay = "predictions", selection.method = "markvariogram", features = rownames(cortex), r.metric = 5, slot = "data") top.clusters <- head(SpatiallyVariableFeatures(cortex), 4) SpatialPlot(object = cortex, features = top.clusters, ncol = 2)
Finally, we show that our integrative procedure is capable of recovering the known spatial localization patterns of both neuronal and non-neuronal subsets, including laminar excitatory, layer-1 astrocytes, and the cortical grey matter.
SpatialFeaturePlot(cortex, features = c("Astro", "L2/3 IT", "L4", "L5 PT", "L5 IT", "L6 CT", "L6 IT", "L6b", "Oligo"), pt.size.factor = 1, ncol = 2, crop = FALSE, alpha = c(0.1, 1))
This dataset of the mouse brain contains another slice corresponding to the other half of the brain. Here we read it in and perform the same initial normalization.
brain2 <- LoadData("stxBrain", type = "posterior1") brain2 <- SCTransform(brain2, assay = "Spatial", verbose = FALSE)
In order to work with multiple slices in the same Seurat object, we provide the
brain.merge <- merge(brain, brain2)
This then enables joint dimensional reduction and clustering on the underlying RNA expression data.
DefaultAssay(brain.merge) <- "SCT" VariableFeatures(brain.merge) <- c(VariableFeatures(brain), VariableFeatures(brain2)) brain.merge <- RunPCA(brain.merge, verbose = FALSE) brain.merge <- FindNeighbors(brain.merge, dims = 1:30) brain.merge <- FindClusters(brain.merge, verbose = FALSE) brain.merge <- RunUMAP(brain.merge, dims = 1:30)
Here, we will be analyzing a dataset generated using Slide-seq v2 of the mouse hippocampus. This tutorial will follow much of the same structure as the spatial vignette for 10x Visium data but is tailored to give a demonstration specific to Slide-seq data.
You can use our SeuratData package for easy data access, as demonstrated below. After installing the dataset, you can type
?ssHippo to see the commands used to create the Seurat object.
slide.seq <- LoadData("ssHippo")
The initial preprocessing steps for the bead by gene expression data are similar to other spatial Seurat analyses and to typical scRNA-seq experiments. Here, we note that many beads contain particularly low UMI counts but choose to keep all detected beads for downstream analysis.
plot1 <- VlnPlot(slide.seq, features = "nCount_Spatial", pt.size = 0, log = TRUE) + NoLegend() slide.seq$log_nCount_Spatial <- log(slide.seq$nCount_Spatial) plot2 <- SpatialFeaturePlot(slide.seq, features = "log_nCount_Spatial") + theme(legend.position = "right") wrap_plots(plot1, plot2)
We then normalize the data using sctransform and perform a standard scRNA-seq dimensionality reduction and clustering workflow.
slide.seq <- SCTransform(slide.seq, assay = "Spatial", ncells = 3000, verbose = FALSE) slide.seq <- RunPCA(slide.seq) slide.seq <- RunUMAP(slide.seq, dims = 1:30) slide.seq <- FindNeighbors(slide.seq, dims = 1:30) slide.seq <- FindClusters(slide.seq, resolution = 0.3, verbose = FALSE)
plot1 <- DimPlot(slide.seq, reduction = "umap", label = TRUE) plot2 <- SpatialDimPlot(slide.seq, stroke = 0) plot1 + plot2
SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c(1, 6, 13)), facet.highlight = TRUE)
To facilitate cell-type annotation of the Slide-seq dataset, we are leveraging an existing mouse single-cell RNA-seq hippocampus dataset, produced in Saunders*, Macosko*, et al. 2018. The data is available for download as a processed Seurat object here, with the raw count matrices available on the DropViz website.
ref <- readRDS("../data/mouse_hippocampus_reference.rds")
The original annotations from the paper are provided in the cell metadata of the Seurat object. These annotations are provided at several “resolutions”, from broad classes (
ref$class) to subclusters within celltypes (
ref$subcluster). For the purposes of this vignette, we’ll work off of a modification of the celltype annotations (
ref$celltype) which we felt struck a good balance.
We’ll start by running the Seurat label transfer method to predict the major celltype for each bead.
anchors <- FindTransferAnchors(reference = ref, query = slide.seq, normalization.method = "SCT", npcs = 50) predictions.assay <- TransferData(anchorset = anchors, refdata = ref$celltype, prediction.assay = TRUE, weight.reduction = slide.seq[["pca"]], dims = 1:50) slide.seq[["predictions"]] <- predictions.assay
We can then visualize the prediction scores for some of the major expected classes.
DefaultAssay(slide.seq) <- "predictions" SpatialFeaturePlot(slide.seq, features = c("Dentate Principal cells", "CA3 Principal cells", "Entorhinal cortex", "Endothelial tip", "Ependymal", "Oligodendrocyte"), alpha = c(0.1, 1))
slide.seq$predicted.id <- GetTransferPredictions(slide.seq) Idents(slide.seq) <- "predicted.id" SpatialDimPlot(slide.seq, cells.highlight = CellsByIdentities(object = slide.seq, idents = c("CA3 Principal cells", "Dentate Principal cells", "Endothelial tip")), facet.highlight = TRUE)
As mentioned in the Visium vignette, we can identify spatially variable features in two general ways: differential expression testing between pre-annotated anatomical regions or statistics that measure the dependence of a feature on spatial location.
Here, we demonstrate the latter with an implementation of Moran’s I available via
FindSpatiallyVariableFeatures() by setting
method = 'moransi'. Moran’s I computes an overall spatial autocorrelation and gives a statistic (similar to a correlation coefficient) that measures the dependence of a feature on spatial location. This allows us to rank features based on how spatially variable their expression is. In order to facilitate quick estimation of this statistic, we implemented a basic binning strategy that will draw a rectangular grid over Slide-seq puck and average the feature and location within each bin. The number of bins in the x and y direction are controlled by the
y.cuts parameters respectively. Additionally, while not required, installing the optional
install.packages('Rfast2')), will significantly decrease the runtime via a more efficient implementation.
DefaultAssay(slide.seq) <- "SCT" slide.seq <- FindSpatiallyVariableFeatures(slide.seq, assay = "SCT", slot = "scale.data", features = VariableFeatures(slide.seq)[1:1000], selection.method = "moransi", x.cuts = 100, y.cuts = 100)
Now we visualize the expression of the top 6 features identified by Moran’s I.
SpatialFeaturePlot(slide.seq, features = head(SpatiallyVariableFeatures(slide.seq, selection.method = "moransi"), 6), ncol = 3, alpha = c(0.1, 1), max.cutoff = "q95")
## R version 4.1.0 (2021-05-18) ## Platform: x86_64-pc-linux-gnu (64-bit) ## Running under: Ubuntu 20.04.2 LTS ## ## Matrix products: default ## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so ## ## locale: ##  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ##  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ##  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C ##  LC_PAPER=en_US.UTF-8 LC_NAME=C ##  LC_ADDRESS=C LC_TELEPHONE=C ##  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ##  stats graphics grDevices utils datasets methods base ## ## other attached packages: ##  vembedr_0.1.4 htmltools_0.5.2 ##  dplyr_1.0.7 patchwork_1.1.1 ##  ggplot2_3.3.5 thp1.eccite.SeuratData_3.1.5 ##  stxBrain.SeuratData_0.1.1 ssHippo.SeuratData_3.1.4 ##  pbmcsca.SeuratData_3.0.0 pbmcMultiome.SeuratData_0.1.1 ##  pbmc3k.SeuratData_3.1.4 panc8.SeuratData_3.0.2 ##  ifnb.SeuratData_3.1.0 hcabm40k.SeuratData_3.0.0 ##  bmcite.SeuratData_0.3.0 SeuratData_0.2.1 ##  SeuratObject_4.0.4 Seurat_4.0.6 ## ## loaded via a namespace (and not attached): ##  systemfonts_1.0.2 plyr_1.8.6 igraph_1.2.11 ##  lazyeval_0.2.2 splines_4.1.0 listenv_0.8.0 ##  scattermore_0.7 digest_0.6.29 fansi_1.0.0 ##  magrittr_2.0.1 memoise_2.0.0 tensor_1.5 ##  cluster_2.1.2 ROCR_1.0-11 limma_3.48.0 ##  globals_0.14.0 matrixStats_0.61.0 pkgdown_1.6.1 ##  spatstat.sparse_2.1-0 colorspace_2.0-2 rappdirs_0.3.3 ##  ggrepel_0.9.1 textshaping_0.3.5 xfun_0.25 ##  crayon_1.4.2 jsonlite_1.7.2 spatstat.data_2.1-2 ##  survival_3.2-11 zoo_1.8-9 glue_1.6.0 ##  polyclip_1.10-0 gtable_0.3.0 leiden_0.3.9 ##  RcppZiggurat_0.1.6 future.apply_1.8.1 abind_1.4-5 ##  scales_1.1.1 DBI_1.1.1 miniUI_0.1.1.1 ##  Rcpp_1.0.7 viridisLite_0.4.0 xtable_1.8-4 ##  reticulate_1.22 spatstat.core_2.3-2 Rfast2_0.0.9 ##  htmlwidgets_1.5.4 httr_1.4.2 RColorBrewer_1.1-2 ##  ellipsis_0.3.2 ica_1.0-2 farver_2.1.0 ##  pkgconfig_2.0.3 sass_0.4.0 uwot_0.1.11 ##  deldir_1.0-6 utf8_1.2.2 labeling_0.4.2 ##  tidyselect_1.1.1 rlang_0.4.12 reshape2_1.4.4 ##  later_1.3.0 munsell_0.5.0 tools_4.1.0 ##  cachem_1.0.6 cli_3.1.0 generics_0.1.1 ##  ggridges_0.5.3 evaluate_0.14 stringr_1.4.0 ##  fastmap_1.1.0 yaml_2.2.1 ragg_1.1.3 ##  goftest_1.2-3 knitr_1.33 fs_1.5.2 ##  fitdistrplus_1.1-6 purrr_0.3.4 RANN_2.6.1 ##  pbapply_1.5-0 future_1.23.0 nlme_3.1-152 ##  mime_0.12 formatR_1.11 compiler_4.1.0 ##  plotly_4.10.0 png_0.1-7 spatstat.utils_2.3-0 ##  tibble_3.1.6 bslib_0.3.1 stringi_1.7.6 ##  highr_0.9 RSpectra_0.16-0 desc_1.4.0 ##  lattice_0.20-44 Matrix_1.3-3 vctrs_0.3.8 ##  pillar_1.6.4 lifecycle_1.0.1 spatstat.geom_2.3-1 ##  lmtest_0.9-39 jquerylib_0.1.4 RcppAnnoy_0.0.19 ##  data.table_1.14.2 cowplot_1.1.1 irlba_2.3.5 ##  httpuv_1.6.5 R6_2.5.1 promises_18.104.22.168 ##  KernSmooth_2.23-20 gridExtra_2.3 parallelly_1.30.0 ##  codetools_0.2-18 MASS_7.3-54 assertthat_0.2.1 ##  rprojroot_2.0.2 withr_2.4.3 sctransform_0.3.2 ##  mgcv_1.8-35 parallel_4.1.0 grid_4.1.0 ##  rpart_4.1-15 tidyr_1.1.4 Rfast_2.0.3 ##  rmarkdown_2.10 Rtsne_0.15 shiny_1.7.1