This tutorial demonstrates how to use Seurat (>=3.2) to analyze spatially-resolved RNA-seq data. While the analytical pipelines are similar to the Seurat workflow for single-cell RNA-seq analysis, we introduce updated interaction and visualization tools, with a particular emphasis on the integration of spatial and molecular information. This tutorial will cover the following tasks, which we believe will be common for many spatial analyses:
For our first vignette, we analyze a dataset generated with the Visium technology from 10x Genomics. We will be extending Seurat to work with additional data types in the near-future, including SLIDE-Seq, STARmap, and MERFISH.
First, we load Seurat and the other packages necessary for this vignette.
library(Seurat)
library(SeuratData)
library(ggplot2)
library(patchwork)
library(dplyr)
Here, we will be using a recently released dataset of sagital mouse brain slices generated using the Visium v1 chemistry. There are two serial anterior sections, and two (matched) serial posterior sections.
You can download the data here, and load it into Seurat using the Load10X_Spatial
function. This reads in the output of the spaceranger pipeline, and returns a Seurat object that contains both the spot-level expression data along with the associated image of the tissue slice. You can also use our SeuratData package for easy data access, as demonstrated below. After installing the dataset, you can type ?stxBrain
to learn more.
InstallData("stxBrain")
brain <- LoadData("stxBrain", type = "anterior1")
How is the spatial data stored within Seurat?
The visium data from 10x consists of the following data types:
In the Seurat object, the spot by gene expression matrix is similar to a typical "RNA" Assay
but contains spot level, not single-cell level data. The image itself is stored in a new images
slot in the Seurat object. The images
slot also stores the information necessary to associate spots with their physical position on the tissue image.
The initial preprocessing steps that we perform on the spot by gene expression data are similar to a typical scRNA-seq experiment. We first need to normalize the data in order to account for variance in sequencing depth across data points. We note that the variance in molecular counts / spot can be substantial for spatial datasets, particularly if there are differences in cell density across the tissue. We see substantial heterogeneity here, which requires effective normalization.
plot1 <- VlnPlot(brain, features = "nCount_Spatial", pt.size = 0.1) + NoLegend()
plot2 <- SpatialFeaturePlot(brain, features = "nCount_Spatial") + theme(legend.position = "right")
wrap_plots(plot1, plot2)
These plots demonstrate that the variance in molecular counts across spots is not just technical in nature, but also is dependent on the tissue anatomy. For example, regions of the tissue that are depleted for neurons (such as the cortical white matter), reproducibly exhibit lower molecular counts. As a result, standard approaches (such as the LogNormalize
function), which force each data point to have the same underlying 'size' after normalization, can be problematic.
As an alternative, we recommend using sctransform (Hafemeister and Satija, Genome Biology 2019), which which builds regularized negative binomial models of gene expression in order to account for technical artifacts while preserving biological variance. For more details on sctransform, please see the paper here and the Seurat vignette here. sctransform normalizes the data, detects high-variance features, and stores the data in the SCT
assay.
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE)
How do results compare to log-normalization?
To explore the differences in normalization methods, we examine how both the sctransform and log normalization results correlate with the number of UMIs. For this comparison, we first rerun sctransform to store values for all genes and run a log-normalization procedure via NormalizeData
.
# rerun normalization to store sctransform residuals for all genes
brain <- SCTransform(brain, assay = "Spatial", return.only.var.genes = FALSE, verbose = FALSE)
# also run standard log normalization for comparison
brain <- NormalizeData(brain, verbose = FALSE, assay = "Spatial")
# Computes the correlation of the log normalized data and sctransform residuals with the number
# of UMIs
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "Spatial", slot = "data", do.plot = FALSE)
brain <- GroupCorrelation(brain, group.assay = "Spatial", assay = "SCT", slot = "scale.data", do.plot = FALSE)
p1 <- GroupCorrelationPlot(brain, assay = "Spatial", cor = "nCount_Spatial_cor") + ggtitle("Log Normalization") +
theme(plot.title = element_text(hjust = 0.5))
p2 <- GroupCorrelationPlot(brain, assay = "SCT", cor = "nCount_Spatial_cor") + ggtitle("SCTransform Normalization") +
theme(plot.title = element_text(hjust = 0.5))
p1 + p2
For the boxplots above, we calculate the correlation of each feature (gene) with the number of UMIs (the nCount_Spatial
variable here). We then place genes into groups based on their mean expression, and generate boxplots of these correlations. You can see that log-normalization fails to adequately normalize genes in the first three groups, suggesting that technical factors continue to influence normalized expression estimates for highly expressed genes. In contrast, sctransform normalization substantially mitigates this effect.
In Seurat v3.2, we have included new functionality to explore and interact with the inherently visual nature of spatial data. The SpatialFeaturePlot
function in Seurat extends FeaturePlot
, and can overlay molecular data on top of tissue histology. For example, in this data set of the mouse brain, the gene Hpca is a strong hippocampal marker and Ttr is a marker of the choroid plexus.
SpatialFeaturePlot(brain, features = c("Hpca", "Ttr"))
The default parameters in Seurat emphasize the visualization of molecular data. However, you can also adjust the size of the spots (and their transparency) to improve the visualization of the histology image, by changing the following parameters:
pt.size.factor
- This will scale the size of the spots. Default is 1.6alpha
- minimum and maximum transparency. Default is c(1, 1).alpha
c(0.1, 1), to downweight the transparency of points with lower expressionp1 <- SpatialFeaturePlot(brain, features = "Ttr", pt.size.factor = 1)
p2 <- SpatialFeaturePlot(brain, features = "Ttr", alpha = c(0.1, 1))
p1 + p2