Overview

In this vignette, we introduce a Seurat extension to analyze new types of spatially-resolved data. We have previously introduced a spatial framework which is compatible with sequencing-based technologies, like the 10x Genomics Visium system, or SLIDE-seq. Here, we extend this framework to analyze new data types that are captured via highly multiplexed imaging. In contrast to sequencing-based technologies, these datasets are often targeted (i.e. they profile a pre-selected set of genes). However they can resolve individual molecules - retaining single-cell (and subcellular) resolution. These approaches also often capture cellular boundaries (segmentations).

We update the Seurat infrastructure to enable the analysis, visualization, and exploration of these exciting datasets. In this vignette, we focus on three datasets produced by different multiplexed imaging technologies, each of which is publicly available. We will be adding support for additional imaging-based technologies in the coming months.

  • Vizgen MERSCOPE (Mouse Brain)
  • Nanostring CosMx Spatial Molecular Imager (FFPE Human Lung)
  • Akoya CODEX (Human Lymph Node)

First, we install the updated versions of Seurat and SeuratObject that support this infrastructure, as well as other packages necessary for this vignette.

library(remotes)
remotes::install_github("satijalab/seurat", "feat/imaging")
library(Seurat)
library(future)
plan("multisession", workers = 10)

Mouse Brain: Vizgen MERSCOPE

This dataset was produced using the Vizgen MERSCOPE system, which utilizes the MERFISH technology. The total dataset is available for public download, and contains nine samples (three full coronal slices of the mouse brain, with three biological replicates per slice). The gene panel consists of 483 gene targets, representing known anonical cell type markers, nonsensory G-Protein coupled receptors (GPCRs), and Receptor Tyrosine Kinases (RTKs). In this vignette, we analyze one of the samples - slice 2, replicate 1. The median number of transcripts detected in each cell is 206.

First, we read in the dataset and create a Seurat object.

We use the LoadVizgen() function, which we have written to read in the output of the Vizgen analysis pipeline. The resulting Seurat object contains the following information:

  • A count matrix, indicating the number of observed molecules for each of the 483 transcripts in each cell. This matrix is analogous to a count matrix in scRNA-seq, and is stored by default in the RNA assay of the Seurat object
# Loading segmentations is a slow process and multi processing with the future pacakge is
# recommended
vizgen.obj <- LoadVizgen(data.dir = "s2r1/", fov = "s2r1")

The next pieces of information are specific to imaging assays, and is stored in the images slot of the resulting Seurat object:

Cell Centroids: The spatial coordinates marking the centroid for each cell being profiled
# Get the center position of each centroid. There is one row per cell in this dataframe.
head(GetTissueCoordinates(vizgen.obj[["s2r1"]][["centroids"]]))
##          x        y                                    cell
## 1 638.5640 4594.216 149164679103246548309819743981609972453
## 2 593.8034 4516.240 215843146921706462965382248182021894607
## 3 597.3134 4566.676 230248905804673613678286091156141465134
## 4 613.2434 4609.498 237155298815097057940587033798543926454
## 5 609.1934 4603.180 256099454901769634241742157204636917386
## 6 623.8814 4642.708  52442222147121971758529793775250916001
Cell Segmentation Boundaries: The spatial coordinates that describe the polygon segmentation of each single cell
# Get the coordinates for each segmentation vertice. Each cell will have a variable number of
# vertices describing its shape.
head(GetTissueCoordinates(vizgen.obj[["s2r1"]][["segmentation"]]))
##          x        y                                    cell
## 1 644.0774 4589.022 149164679103246548309819743981609972453
## 2 643.9694 4589.022 149164679103246548309819743981609972453
## 3 643.8614 4589.022 149164679103246548309819743981609972453
## 4 643.7642 4588.924 149164679103246548309819743981609972453
## 5 643.7534 4588.914 149164679103246548309819743981609972453
## 6 643.6454 4588.914 149164679103246548309819743981609972453
Molecule positions: The spatial coordinates for each individual molecule that was detected during the multiplexed smFISH experiment.
# Fetch molecules positions for Chrm1
head(FetchData(vizgen.obj[["s2r1"]][["molecules"]], vars = "Chrm1"))
##          x        y molecule
## 1 577.3373 4205.977    Chrm1
## 2 600.0218 3917.781    Chrm1
## 3 508.2736 3934.063    Chrm1
## 4 630.7590 3948.586    Chrm1
## 5 635.1143 3969.567    Chrm1
## 6 582.7043 4021.577    Chrm1


Preprocessing and unsupervised analysis

We start by performing a standard unsupervised clustering analysis, essentially first treating the dataset as an scRNA-seq experiment. We use SCTransform-based normalization, though we slightly modify the default clipping parameters to mitigate the effect of outliers that we occasionally observe in smFISH experiments. After normalization, we can run dimensional reduction and clustering.

vizgen.obj <- SCTransform(vizgen.obj, assay = "Vizgen", clip.range = c(-10, 10), )
vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj))
vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30)
vizgen.obj <- FindNeighbors(vizgen.obj, reduction = "pca", dims = 1:30)
vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3)

We can then visualize the results of the clustering either in UMAP space (with DimPlot()) or overlaid on the image with ImageDimPlot().

DimPlot(vizgen.obj, reduction = "umap")

ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "polychrome", axes = TRUE)

You can also customize multiple aspect of the plot, including the color scheme, cell border widths, and size (see below).

Customizing spatial plots in Seurat

The ImageDimPlot() and ImageFeaturePlot() functions have a few parameters which you can customize individual visualizations. These include:

  • alpha: Ranges from 0 to 1. Sets the transparency of within-cell coloring.
  • size: determines the size of points representing cells, if centroids are being plotted
  • cols: Sets the color scheme for the internal shading of each cell. Examples settings are polychrome, glasbey, Paired, Set3, and parade. Default is the ggplot2 color palette
  • shuffle.cols: In some cases the selection of cols is more effective when the same colors are assigned to different clusters. Set shuffle.cols = TRUE to randomly shuffle the colors in the palette.
  • border.size: Sets the width of the cell segmentation borders. By default, segmentations are plotted with a border size of 0.3 and centroids are plotted without border.
  • border.color: Sets the color of the cell segmentation borders
  • dark.background: Sets a black background color (TRUE by default)
  • axes: Display

Since it can be difficult to visualize the spatial localization patterns of an individual cluster when viewing them all together, we can highlight all cells that belong to a particular cluster:

p1 <- ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "red", cells = WhichCells(vizgen.obj, idents = 14))
p2 <- ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "red", cells = WhichCells(vizgen.obj, idents = 15))
p1 + p2

We can find markers of individual clusters and visualize their spatial expression pattern. We can color cells based on their quantified expression of an individual gene, using ImageFeaturePlot(), which is analagous to the FeaturePlot() function for visualizing expression on a 2D embedding. Since MERFISH images individual molecules, we can also visualize the location of individual molecules.

p1 <- ImageFeaturePlot(vizgen.obj, features = "Slc17a7")
p2 <- ImageDimPlot(vizgen.obj, molecules = "Slc17a7", nmols = 10000, alpha = 0.3, mols.cols = "red")
p1 + p2

Note that the nmols parameter can be used to reduce the total number of molecules shown to reduce overplotting. You can also use the mols.size, mols.cols, and mols.alpha parameter to further optimize.

Plotting molecules is especially useful for visualizing co-expression of multiple genes on the same plot.

p1 <- ImageDimPlot(vizgen.obj, fov = "s2r1", alpha = 0.3, molecules = c("Slc17a7", "Olig1"), nmols = 10000)
markers.14 <- FindMarkers(vizgen.obj, ident.1 = "14")
p2 <- ImageDimPlot(vizgen.obj, fov = "s2r1", alpha = 0.3, molecules = rownames(markers.14)[1:4],
    nmols = 10000)
p1 + p2

The updated Seurat spatial framework has the option to treat cells as individual points, or also to visualize cell boundaries (segmentations). By default, Seurat ignores cell segmentations and treats each cell as a point (‘centroids’). This speeds up plotting, especially when looking at large areas, where cell boundaries are too small to visualize.

We can zoom into a region of tissue, creating a new field of view. For example, we can zoom into a region that contains the hippocampus. Once zoomed-in, we can set DefaultBoundary() to show cell segmentations. You can also ‘simplify’ the cell segmentations, reducing the number of edges in each polygon to speed up plotting.

# create a Crop
cropped.coords <- Crop(vizgen.obj[["s2r1"]], x = c(1750, 3000), y = c(3750, 5250), coords = "plot")
# set a new field of view (fov)
vizgen.obj[["hippo"]] <- cropped.coords

# visualize FOV using default settings (no cell boundaries)
p1 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, size = 0.7, border.color = "white", cols = "polychrome",
    coord.fixed = FALSE)

# visualize FOV with full cell segmentations
DefaultBoundary(vizgen.obj[["hippo"]]) <- "segmentation"
p2 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, border.color = "white", border.size = 0.1,
    cols = "polychrome", coord.fixed = FALSE)

# simplify cell segmentations
vizgen.obj[["hippo"]][["simplified.segmentations"]] <- Simplify(coords = vizgen.obj[["hippo"]][["segmentation"]],
    tol = 3)
DefaultBoundary(vizgen.obj[["hippo"]]) <- "simplified.segmentations"

# visualize FOV with simplified cell segmentations
DefaultBoundary(vizgen.obj[["hippo"]]) <- "simplified.segmentations"
p3 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, border.color = "white", border.size = 0.1,
    cols = "polychrome", coord.fixed = FALSE)

p1 + p2 + p3

What is the tol parameter?

The tol parameter determines how simplified the resulting segmentations are. A higher value of tol will reduce the number of vertices more drastically which will speed up plotting, but some segmentation detail will be lost. See https://rgeos.r-forge.r-project.org/reference/topo-unary-gSimplify.html for examples using different values for tol.

We can visualize individual molecules plotted at higher resolution after zooming-in

# Since there is nothing behind the segmentations, alpha will slightly mute colors
ImageDimPlot(vizgen.obj, fov = "hippo", molecules = rownames(markers.14)[1:4], cols = "polychrome",
    mols.size = 1, alpha = 0.5, mols.cols = c("red", "blue", "yellow", "green"))

Human Lung: Nanostring CosMx Spatial Molecular Imager

This dataset was produced using Nanostring CosMx Spatial Molecular Imager (SMI). The CosMX SMI performs multiplexed single molecule profiling, can profile both RNA and protein targets, and can be applied directly to FFPE tissues. The dataset represents 8 FFPE samples taken from 5 non-small-cell lung cancer (NSCLC) tissues, and is available for public download. The gene panel consists of 960 transcripts.

In this vignette, we load one of 8 samples (lung 5, replicate 1). We use the LoadNanostring() function, which parses the outputs available on the public download site. Note that the coordinates for the cell boundaries were provided by Nanostring by request, and are available for download here.

For this dataset, instead of performing unsupervised analysis, we map the Nanostring profiles to our Azimuth Healthy Human Lung reference, which was defined by scRNA-seq. We used Azimuth version 0.4.3 with the human lung reference version 1.0.0. You can download the precomputed results here, which include annotations, prediction scores, and a UMAP visualization. The median number of detected transcripts/cell is 249, which does create uncertainty for the annotation process.

nano.obj <- LoadNanostring(data.dir = "lung5_rep1", fov = "lung5.rep1")
# add in precomputed Azimuth annotations
azimuth.data <- readRDS("nanostring_data.Rds")
nano.obj <- AddMetaData(nano.obj, metadata = azimuth.data$annotations)
nano.obj[["proj.umap"]] <- azimuth.data$umap
Idents(nano.obj) <- nano.obj$predicted.annotation.l1

# set to avoid error exceeding max allowed size of globals
options(future.globals.maxSize = 8000 * 1024^2)
nano.obj <- SCTransform(nano.obj, assay = "Nanostring", clip.range = c(-10, 10), verbose = FALSE)

# text display of annotations and prediction scores
head(slot(object = nano.obj, name = "meta.data")[2:5])
##     nCount_Nanostring nFeature_Nanostring predicted.annotation.l1
## 1_1                23                  19               Dendritic
## 2_1                26                  23              Macrophage
## 3_1                74                  51          Neuroendocrine
## 4_1                60                  48              Macrophage
## 5_1                52                  39              Macrophage
## 6_1                 5                   5                   CD4 T
##     predicted.annotation.l1.score
## 1_1                     0.5884506
## 2_1                     0.5707920
## 3_1                     0.5449661
## 4_1                     0.6951970
## 5_1                     0.8155319
## 6_1                     0.5677324

We can visualize the Nanostring cells and annotations, projected onto the reference-defined UMAP. Note that for this NSCLC sample, tumor samples are annotated as ‘basal’, which is the closest cell type match in the healthy reference.

DimPlot(nano.obj)

Visualization of cell type and expression localization patterns

As in the previous example, ImageDimPlot() plots c ells based on their spatial locations, and colors them based on their assigned cell type. Notice that the basal cell population (tumor cells) is tightly spatially organized, as expected.

ImageDimPlot(nano.obj, fov = "lung5.rep1", axes = TRUE, cols = "glasbey")

Since there are many cell types present, we can highlight the localization of a few select groups.

ImageDimPlot(nano.obj, fov = "lung5.rep1", cells = WhichCells(nano.obj, idents = c("Basal", "Macrophage",
    "Smooth Muscle", "CD4 T")), cols = c("red", "green", "blue", "orange"), size = 0.6)

We can also visualize gene expression markers a few different ways:

VlnPlot(nano.obj, features = "KRT17", slot = "counts", pt.size = 0.1, y.max = 30) + NoLegend()

FeaturePlot(nano.obj, features = "KRT17")

p1 <- ImageFeaturePlot(nano.obj, fov = "lung5.rep1", features = "KRT17", max.cutoff = "q95")
p2 <- ImageDimPlot(nano.obj, fov = "lung5.rep1", alpha = 0.3, molecules = "KRT17", nmols = 10000) +
    NoLegend()
p1 + p2

We can plot molecules in order to co-visualize the expression of multiple markers, including KRT17 (basal cells), C1QA (macrophages), IL7R (T cells), and TAGLN (Smooth muscle cells).

# Plot some of the molecules which seem to display spatial correlation with each other
ImageDimPlot(nano.obj, fov = "lung5.rep1", group.by = NA, alpha = 0.3, molecules = c("KRT17", "C1QA",
    "IL7R", "TAGLN"), nmols = 20000)

We zoom in on one basal-rich region using the Crop() function. Once zoomed-in, we can visualize individual cell boundaries as well in all visualizations.

basal.crop <- Crop(nano.obj[["lung5.rep1"]], x = c(159500, 164000), y = c(8700, 10500))
nano.obj[["zoom1"]] <- basal.crop
DefaultBoundary(nano.obj[["zoom1"]]) <- "segmentation"
ImageDimPlot(nano.obj, fov = "zoom1", cols = "polychrome", coord.fixed = FALSE)

# note the clouds of TPSAB1 molecules denoting mast cells
ImageDimPlot(nano.obj, fov = "zoom1", cols = "polychrome", alpha = 0.3, molecules = c("KRT17", "IL7R",
    "TPSAB1"), mols.size = 0.3, nmols = 20000, border.color = "black", coord.fixed = FALSE)

Human Lymph Node: Akoya CODEX system

This dataset was produced using Akoya CODEX system. The CODEX system performs multiplexed spatially-resolved protein profiling, iteratively visualizing antibody-binding events. The dataset here represents a tissue section from a human lymph node, and was generated by the University of Florida as part of the Human Biomolecular Atlas Program (HuBMAP). More information about the sample and experiment is available here. The protein panel in this dataset consists of 28 markers, and protein intensities were quantified as part of the Akoya processor pipeline, which outputs a CSV file providing the intensity of each marker in each cell, as well as the cell coordinates. The file is available for public download via Globus here.

First, we load in the data of a HuBMAP dataset using the LoadAkoya() function in Seurat:

codex.obj <- LoadAkoya(filename = "LN7910_20_008_11022020_reg001_compensated.csv",
    type = "processor", fov = "HBM754.WKLP.262")

We can now run unsupervised analysis to identify cell clusters. To normalize the protein data, we use centered log-ratio based normalization, as we typically apply to the protein modality of CITE-seq data. We then run dimensional reduction and graph-based clustering.

codex.obj <- NormalizeData(object = codex.obj, normalization.method = "CLR", margin = 2)
codex.obj <- ScaleData(codex.obj)
VariableFeatures(codex.obj) <- rownames(codex.obj)  # since the panel is small, treat all features as variable.
codex.obj <- RunPCA(object = codex.obj, npcs = 20, verbose = FALSE)
codex.obj <- RunUMAP(object = codex.obj, dims = 1:20, verbose = FALSE)
codex.obj <- FindNeighbors(object = codex.obj, dims = 1:20, verbose = FALSE)
codex.obj <- FindClusters(object = codex.obj, verbose = FALSE, resolution = 0.4, n.start = 1)

We can visualize the cell clusters based on a protein intensity-based UMAP embedding, or also based on their spatial location.

DimPlot(codex.obj, label = TRUE, label.box = TRUE) + NoLegend()

ImageDimPlot(codex.obj, cols = "parade")

The expression patters of individual markers clearly denote different cell types and spatial structures, including Lyve1 (lymphatic endothelial cells), CD34 (blood endothelial cells), and CD21 (B cells). As expected, endothelial cells group together into vessels, and B cells are key components of specialized microstructures known as germinal zones. You can read more about protein markers in this dataset, as well as cellular networks in human lynmphatic tissues, in this preprint.

p1 <- ImageFeaturePlot(codex.obj, fov = "HBM754.WKLP.262", features = c("CD34", "CD21", "Lyve1"),
    min.cutoff = "q10", max.cutoff = "q90")
p2 <- ImageDimPlot(codex.obj, fov = "HBM754.WKLP.262", cols = "parade")
p1 + p2

Each of these datasets represents an opportunity to learn organizing principles that govern the spatial localization of different cell types. Stay tuned for future updates to Seurat enabling further exploration and characterization of the relationship between spatial position and molecular state.

Session Info
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] future_1.25.0      sp_1.4-7           SeuratObject_4.1.0 Seurat_4.1.0.9007 
## [5] remotes_2.4.2     
## 
## loaded via a namespace (and not attached):
##   [1] systemfonts_1.0.4     plyr_1.8.7            igraph_1.3.1         
##   [4] lazyeval_0.2.2        splines_4.1.3         listenv_0.8.0        
##   [7] scattermore_0.8       ggplot2_3.3.5         digest_0.6.29        
##  [10] htmltools_0.5.2       fansi_1.0.3           magrittr_2.0.3       
##  [13] memoise_2.0.1         tensor_1.5            cluster_2.1.3        
##  [16] ROCR_1.0-11           limma_3.50.3          globals_0.14.0       
##  [19] matrixStats_0.62.0    pkgdown_2.0.3         spatstat.sparse_2.1-1
##  [22] colorspace_2.0-3      ggrepel_0.9.1         textshaping_0.3.6    
##  [25] xfun_0.30             dplyr_1.0.9           crayon_1.5.1         
##  [28] jsonlite_1.8.0        progressr_0.10.0      spatstat.data_2.2-0  
##  [31] survival_3.3-1        zoo_1.8-10            glue_1.6.2           
##  [34] polyclip_1.10-0       gtable_0.3.0          leiden_0.3.10        
##  [37] future.apply_1.9.0    abind_1.4-5           scales_1.2.0         
##  [40] DBI_1.1.2             spatstat.random_2.2-0 miniUI_0.1.1.1       
##  [43] Rcpp_1.0.8.3          viridisLite_0.4.0     xtable_1.8-4         
##  [46] reticulate_1.24       spatstat.core_2.4-2   bit_4.0.4            
##  [49] htmlwidgets_1.5.4     httr_1.4.2            RColorBrewer_1.1-3   
##  [52] ellipsis_0.3.2        ica_1.0-2             pkgconfig_2.0.3      
##  [55] farver_2.1.0          sass_0.4.1            uwot_0.1.11          
##  [58] deldir_1.0-6          utf8_1.2.2            tidyselect_1.1.2     
##  [61] labeling_0.4.2        rlang_1.0.2           reshape2_1.4.4       
##  [64] later_1.3.0           munsell_0.5.0         tools_4.1.3          
##  [67] cachem_1.0.6          cli_3.3.0             generics_0.1.2       
##  [70] ggridges_0.5.3        evaluate_0.15         stringr_1.4.0        
##  [73] fastmap_1.1.0         yaml_2.3.5            ragg_1.2.2           
##  [76] goftest_1.2-3         knitr_1.38            bit64_4.0.5          
##  [79] fs_1.5.2              fitdistrplus_1.1-8    purrr_0.3.4          
##  [82] RANN_2.6.1            pbapply_1.5-0         nlme_3.1-157         
##  [85] mime_0.12             formatR_1.12          ggrastr_1.0.1        
##  [88] hdf5r_1.3.5           compiler_4.1.3        rstudioapi_0.13      
##  [91] beeswarm_0.4.0        plotly_4.10.0         curl_4.3.2           
##  [94] png_0.1-7             spatstat.utils_2.3-0  tibble_3.1.6         
##  [97] bslib_0.3.1           stringi_1.7.6         highr_0.9            
## [100] desc_1.4.1            RSpectra_0.16-1       rgeos_0.5-9          
## [103] lattice_0.20-45       Matrix_1.4-1          vctrs_0.4.1          
## [106] pillar_1.7.0          lifecycle_1.0.1       spatstat.geom_2.4-0  
## [109] lmtest_0.9-40         jquerylib_0.1.4       RcppAnnoy_0.0.19     
## [112] data.table_1.14.2     cowplot_1.1.1         irlba_2.3.5          
## [115] httpuv_1.6.5          patchwork_1.1.1       R6_2.5.1             
## [118] promises_1.2.0.1      KernSmooth_2.23-20    gridExtra_2.3        
## [121] vipor_0.4.5           parallelly_1.31.1     codetools_0.2-18     
## [124] MASS_7.3-56           assertthat_0.2.1      rprojroot_2.0.3      
## [127] withr_2.5.0           sctransform_0.3.3     mgcv_1.8-40          
## [130] parallel_4.1.3        grid_4.1.3            rpart_4.1.16         
## [133] tidyr_1.2.0           rmarkdown_2.14        Cairo_1.5-15         
## [136] Rtsne_0.16            shiny_1.7.1           ggbeeswarm_0.6.0