In this vignette, we introduce a Seurat extension to analyze new types of spatially-resolved data. We have previously introduced a spatial framework which is compatible with sequencing-based technologies, like the 10x Genomics Visium system, or SLIDE-seq. Here, we extend this framework to analyze new data types that are captured via highly multiplexed imaging. In contrast to sequencing-based technologies, these datasets are often targeted (i.e. they profile a pre-selected set of genes). However they can resolve individual molecules - retaining single-cell (and subcellular) resolution. These approaches also often capture cellular boundaries (segmentations).
We update the Seurat infrastructure to enable the analysis, visualization, and exploration of these exciting datasets. In this vignette, we focus on three datasets produced by different multiplexed imaging technologies, each of which is publicly available. We will be adding support for additional imaging-based technologies in the coming months.
First, we install the updated versions of Seurat and SeuratObject that support this infrastructure, as well as other packages necessary for this vignette.
This dataset was produced using the Vizgen MERSCOPE system, which utilizes the MERFISH technology. The total dataset is available for public download, and contains nine samples (three full coronal slices of the mouse brain, with three biological replicates per slice). The gene panel consists of 483 gene targets, representing known anonical cell type markers, nonsensory G-Protein coupled receptors (GPCRs), and Receptor Tyrosine Kinases (RTKs). In this vignette, we analyze one of the samples - slice 2, replicate 1. The median number of transcripts detected in each cell is 206.
First, we read in the dataset and create a Seurat object.
We use the
LoadVizgen() function, which we have written to read in the output of the Vizgen analysis pipeline. The resulting Seurat object contains the following information:
# Loading segmentations is a slow process and multi processing with the future pacakge is # recommended vizgen.obj <- LoadVizgen(data.dir = "s2r1/", fov = "s2r1")
The next pieces of information are specific to imaging assays, and is stored in the images slot of the resulting Seurat object:
# Get the center position of each centroid. There is one row per cell in this dataframe. head(GetTissueCoordinates(vizgen.obj[["s2r1"]][["centroids"]]))
## x y cell ## 1 638.5640 4594.216 149164679103246548309819743981609972453 ## 2 593.8034 4516.240 215843146921706462965382248182021894607 ## 3 597.3134 4566.676 230248905804673613678286091156141465134 ## 4 613.2434 4609.498 237155298815097057940587033798543926454 ## 5 609.1934 4603.180 256099454901769634241742157204636917386 ## 6 623.8814 4642.708 52442222147121971758529793775250916001
# Get the coordinates for each segmentation vertice. Each cell will have a variable number of # vertices describing its shape. head(GetTissueCoordinates(vizgen.obj[["s2r1"]][["segmentation"]]))
## x y cell ## 1 644.0774 4589.022 149164679103246548309819743981609972453 ## 2 643.9694 4589.022 149164679103246548309819743981609972453 ## 3 643.8614 4589.022 149164679103246548309819743981609972453 ## 4 643.7642 4588.924 149164679103246548309819743981609972453 ## 5 643.7534 4588.914 149164679103246548309819743981609972453 ## 6 643.6454 4588.914 149164679103246548309819743981609972453
## x y molecule ## 1 577.3373 4205.977 Chrm1 ## 2 600.0218 3917.781 Chrm1 ## 3 508.2736 3934.063 Chrm1 ## 4 630.7590 3948.586 Chrm1 ## 5 635.1143 3969.567 Chrm1 ## 6 582.7043 4021.577 Chrm1
We start by performing a standard unsupervised clustering analysis, essentially first treating the dataset as an scRNA-seq experiment. We use SCTransform-based normalization, though we slightly modify the default clipping parameters to mitigate the effect of outliers that we occasionally observe in smFISH experiments. After normalization, we can run dimensional reduction and clustering.
vizgen.obj <- SCTransform(vizgen.obj, assay = "Vizgen", clip.range = c(-10, 10), ) vizgen.obj <- RunPCA(vizgen.obj, npcs = 30, features = rownames(vizgen.obj)) vizgen.obj <- RunUMAP(vizgen.obj, dims = 1:30) vizgen.obj <- FindNeighbors(vizgen.obj, reduction = "pca", dims = 1:30) vizgen.obj <- FindClusters(vizgen.obj, resolution = 0.3)
DimPlot(vizgen.obj, reduction = "umap")
ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "polychrome", axes = TRUE)
You can also customize multiple aspect of the plot, including the color scheme, cell border widths, and size (see below).
parade. Default is the ggplot2 color palette
colsis more effective when the same colors are assigned to different clusters. Set
shuffle.cols = TRUEto randomly shuffle the colors in the palette.
Since it can be difficult to visualize the spatial localization patterns of an individual cluster when viewing them all together, we can highlight all cells that belong to a particular cluster:
p1 <- ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "red", cells = WhichCells(vizgen.obj, idents = 14)) p2 <- ImageDimPlot(vizgen.obj, fov = "s2r1", cols = "red", cells = WhichCells(vizgen.obj, idents = 15)) p1 + p2
We can find markers of individual clusters and visualize their spatial expression pattern. We can color cells based on their quantified expression of an individual gene, using
ImageFeaturePlot(), which is analagous to the
FeaturePlot() function for visualizing expression on a 2D embedding. Since MERFISH images individual molecules, we can also visualize the location of individual molecules.
p1 <- ImageFeaturePlot(vizgen.obj, features = "Slc17a7") p2 <- ImageDimPlot(vizgen.obj, molecules = "Slc17a7", nmols = 10000, alpha = 0.3, mols.cols = "red") p1 + p2
Note that the
nmols parameter can be used to reduce the total number of molecules shown to reduce overplotting. You can also use the
mols.alpha parameter to further optimize.
Plotting molecules is especially useful for visualizing co-expression of multiple genes on the same plot.
p1 <- ImageDimPlot(vizgen.obj, fov = "s2r1", alpha = 0.3, molecules = c("Slc17a7", "Olig1"), nmols = 10000) markers.14 <- FindMarkers(vizgen.obj, ident.1 = "14") p2 <- ImageDimPlot(vizgen.obj, fov = "s2r1", alpha = 0.3, molecules = rownames(markers.14)[1:4], nmols = 10000) p1 + p2
The updated Seurat spatial framework has the option to treat cells as individual points, or also to visualize cell boundaries (segmentations). By default, Seurat ignores cell segmentations and treats each cell as a point (‘centroids’). This speeds up plotting, especially when looking at large areas, where cell boundaries are too small to visualize.
We can zoom into a region of tissue, creating a new field of view. For example, we can zoom into a region that contains the hippocampus. Once zoomed-in, we can set
DefaultBoundary() to show cell segmentations. You can also ‘simplify’ the cell segmentations, reducing the number of edges in each polygon to speed up plotting.
# create a Crop cropped.coords <- Crop(vizgen.obj[["s2r1"]], x = c(1750, 3000), y = c(3750, 5250), coords = "plot") # set a new field of view (fov) vizgen.obj[["hippo"]] <- cropped.coords # visualize FOV using default settings (no cell boundaries) p1 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, size = 0.7, border.color = "white", cols = "polychrome", coord.fixed = FALSE) # visualize FOV with full cell segmentations DefaultBoundary(vizgen.obj[["hippo"]]) <- "segmentation" p2 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, border.color = "white", border.size = 0.1, cols = "polychrome", coord.fixed = FALSE) # simplify cell segmentations vizgen.obj[["hippo"]][["simplified.segmentations"]] <- Simplify(coords = vizgen.obj[["hippo"]][["segmentation"]], tol = 3) DefaultBoundary(vizgen.obj[["hippo"]]) <- "simplified.segmentations" # visualize FOV with simplified cell segmentations DefaultBoundary(vizgen.obj[["hippo"]]) <- "simplified.segmentations" p3 <- ImageDimPlot(vizgen.obj, fov = "hippo", axes = TRUE, border.color = "white", border.size = 0.1, cols = "polychrome", coord.fixed = FALSE) p1 + p2 + p3
The tol parameter determines how simplified the resulting segmentations are. A higher value of tol will reduce the number of vertices more drastically which will speed up plotting, but some segmentation detail will be lost. See https://rgeos.r-forge.r-project.org/reference/topo-unary-gSimplify.html for examples using different values for tol.
We can visualize individual molecules plotted at higher resolution after zooming-in
# Since there is nothing behind the segmentations, alpha will slightly mute colors ImageDimPlot(vizgen.obj, fov = "hippo", molecules = rownames(markers.14)[1:4], cols = "polychrome", mols.size = 1, alpha = 0.5, mols.cols = c("red", "blue", "yellow", "green"))
This dataset was produced using Nanostring CosMx Spatial Molecular Imager (SMI). The CosMX SMI performs multiplexed single molecule profiling, can profile both RNA and protein targets, and can be applied directly to FFPE tissues. The dataset represents 8 FFPE samples taken from 5 non-small-cell lung cancer (NSCLC) tissues, and is available for public download. The gene panel consists of 960 transcripts.
In this vignette, we load one of 8 samples (lung 5, replicate 1). We use the
LoadNanostring() function, which parses the outputs available on the public download site. Note that the coordinates for the cell boundaries were provided by Nanostring by request, and are available for download here.
For this dataset, instead of performing unsupervised analysis, we map the Nanostring profiles to our Azimuth Healthy Human Lung reference, which was defined by scRNA-seq. We used Azimuth version 0.4.3 with the human lung reference version 1.0.0. You can download the precomputed results here, which include annotations, prediction scores, and a UMAP visualization. The median number of detected transcripts/cell is 249, which does create uncertainty for the annotation process.
nano.obj <- LoadNanostring(data.dir = "lung5_rep1", fov = "lung5.rep1")
# add in precomputed Azimuth annotations azimuth.data <- readRDS("nanostring_data.Rds") nano.obj <- AddMetaData(nano.obj, metadata = azimuth.data$annotations) nano.obj[["proj.umap"]] <- azimuth.data$umap Idents(nano.obj) <- nano.obj$predicted.annotation.l1 # set to avoid error exceeding max allowed size of globals options(future.globals.maxSize = 8000 * 1024^2) nano.obj <- SCTransform(nano.obj, assay = "Nanostring", clip.range = c(-10, 10), verbose = FALSE) # text display of annotations and prediction scores head(slot(object = nano.obj, name = "meta.data")[2:5])
## nCount_Nanostring nFeature_Nanostring predicted.annotation.l1 ## 1_1 23 19 Dendritic ## 2_1 26 23 Macrophage ## 3_1 74 51 Neuroendocrine ## 4_1 60 48 Macrophage ## 5_1 52 39 Macrophage ## 6_1 5 5 CD4 T ## predicted.annotation.l1.score ## 1_1 0.5884506 ## 2_1 0.5707920 ## 3_1 0.5449661 ## 4_1 0.6951970 ## 5_1 0.8155319 ## 6_1 0.5677324
We can visualize the Nanostring cells and annotations, projected onto the reference-defined UMAP. Note that for this NSCLC sample, tumor samples are annotated as ‘basal’, which is the closest cell type match in the healthy reference.
As in the previous example,
ImageDimPlot() plots c ells based on their spatial locations, and colors them based on their assigned cell type. Notice that the basal cell population (tumor cells) is tightly spatially organized, as expected.
ImageDimPlot(nano.obj, fov = "lung5.rep1", axes = TRUE, cols = "glasbey")
Since there are many cell types present, we can highlight the localization of a few select groups.
ImageDimPlot(nano.obj, fov = "lung5.rep1", cells = WhichCells(nano.obj, idents = c("Basal", "Macrophage", "Smooth Muscle", "CD4 T")), cols = c("red", "green", "blue", "orange"), size = 0.6)
We can also visualize gene expression markers a few different ways:
FeaturePlot(nano.obj, features = "KRT17")
p1 <- ImageFeaturePlot(nano.obj, fov = "lung5.rep1", features = "KRT17", max.cutoff = "q95") p2 <- ImageDimPlot(nano.obj, fov = "lung5.rep1", alpha = 0.3, molecules = "KRT17", nmols = 10000) + NoLegend() p1 + p2
We can plot molecules in order to co-visualize the expression of multiple markers, including KRT17 (basal cells), C1QA (macrophages), IL7R (T cells), and TAGLN (Smooth muscle cells).
# Plot some of the molecules which seem to display spatial correlation with each other ImageDimPlot(nano.obj, fov = "lung5.rep1", group.by = NA, alpha = 0.3, molecules = c("KRT17", "C1QA", "IL7R", "TAGLN"), nmols = 20000)
We zoom in on one basal-rich region using the
Crop() function. Once zoomed-in, we can visualize individual cell boundaries as well in all visualizations.
ImageDimPlot(nano.obj, fov = "zoom1", cols = "polychrome", coord.fixed = FALSE)
# note the clouds of TPSAB1 molecules denoting mast cells ImageDimPlot(nano.obj, fov = "zoom1", cols = "polychrome", alpha = 0.3, molecules = c("KRT17", "IL7R", "TPSAB1"), mols.size = 0.3, nmols = 20000, border.color = "black", coord.fixed = FALSE)
This dataset was produced using Akoya CODEX system. The CODEX system performs multiplexed spatially-resolved protein profiling, iteratively visualizing antibody-binding events. The dataset here represents a tissue section from a human lymph node, and was generated by the University of Florida as part of the Human Biomolecular Atlas Program (HuBMAP). More information about the sample and experiment is available here. The protein panel in this dataset consists of 28 markers, and protein intensities were quantified as part of the Akoya processor pipeline, which outputs a CSV file providing the intensity of each marker in each cell, as well as the cell coordinates. The file is available for public download via Globus here.
First, we load in the data of a HuBMAP dataset using the
LoadAkoya() function in Seurat:
codex.obj <- LoadAkoya(filename = "LN7910_20_008_11022020_reg001_compensated.csv", type = "processor", fov = "HBM754.WKLP.262")
We can now run unsupervised analysis to identify cell clusters. To normalize the protein data, we use centered log-ratio based normalization, as we typically apply to the protein modality of CITE-seq data. We then run dimensional reduction and graph-based clustering.
codex.obj <- NormalizeData(object = codex.obj, normalization.method = "CLR", margin = 2) codex.obj <- ScaleData(codex.obj) VariableFeatures(codex.obj) <- rownames(codex.obj) # since the panel is small, treat all features as variable. codex.obj <- RunPCA(object = codex.obj, npcs = 20, verbose = FALSE) codex.obj <- RunUMAP(object = codex.obj, dims = 1:20, verbose = FALSE) codex.obj <- FindNeighbors(object = codex.obj, dims = 1:20, verbose = FALSE) codex.obj <- FindClusters(object = codex.obj, verbose = FALSE, resolution = 0.4, n.start = 1)
We can visualize the cell clusters based on a protein intensity-based UMAP embedding, or also based on their spatial location.
ImageDimPlot(codex.obj, cols = "parade")
The expression patters of individual markers clearly denote different cell types and spatial structures, including Lyve1 (lymphatic endothelial cells), CD34 (blood endothelial cells), and CD21 (B cells). As expected, endothelial cells group together into vessels, and B cells are key components of specialized microstructures known as germinal zones. You can read more about protein markers in this dataset, as well as cellular networks in human lynmphatic tissues, in this preprint.
p1 <- ImageFeaturePlot(codex.obj, fov = "HBM754.WKLP.262", features = c("CD34", "CD21", "Lyve1"), min.cutoff = "q10", max.cutoff = "q90") p2 <- ImageDimPlot(codex.obj, fov = "HBM754.WKLP.262", cols = "parade") p1 + p2
Each of these datasets represents an opportunity to learn organizing principles that govern the spatial localization of different cell types. Stay tuned for future updates to Seurat enabling further exploration and characterization of the relationship between spatial position and molecular state.
## R version 4.1.3 (2022-03-10) ## Platform: x86_64-pc-linux-gnu (64-bit) ## Running under: Ubuntu 20.04.4 LTS ## ## Matrix products: default ## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 ## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 ## ## locale: ##  LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ##  LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ##  LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ##  LC_PAPER=en_US.UTF-8 LC_NAME=C ##  LC_ADDRESS=C LC_TELEPHONE=C ##  LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ##  stats graphics grDevices utils datasets methods base ## ## other attached packages: ##  future_1.25.0 sp_1.4-7 SeuratObject_4.1.0 Seurat_126.96.36.19907 ##  remotes_2.4.2 ## ## loaded via a namespace (and not attached): ##  systemfonts_1.0.4 plyr_1.8.7 igraph_1.3.1 ##  lazyeval_0.2.2 splines_4.1.3 listenv_0.8.0 ##  scattermore_0.8 ggplot2_3.3.5 digest_0.6.29 ##  htmltools_0.5.2 fansi_1.0.3 magrittr_2.0.3 ##  memoise_2.0.1 tensor_1.5 cluster_2.1.3 ##  ROCR_1.0-11 limma_3.50.3 globals_0.14.0 ##  matrixStats_0.62.0 pkgdown_2.0.3 spatstat.sparse_2.1-1 ##  colorspace_2.0-3 ggrepel_0.9.1 textshaping_0.3.6 ##  xfun_0.30 dplyr_1.0.9 crayon_1.5.1 ##  jsonlite_1.8.0 progressr_0.10.0 spatstat.data_2.2-0 ##  survival_3.3-1 zoo_1.8-10 glue_1.6.2 ##  polyclip_1.10-0 gtable_0.3.0 leiden_0.3.10 ##  future.apply_1.9.0 abind_1.4-5 scales_1.2.0 ##  DBI_1.1.2 spatstat.random_2.2-0 miniUI_0.1.1.1 ##  Rcpp_188.8.131.52 viridisLite_0.4.0 xtable_1.8-4 ##  reticulate_1.24 spatstat.core_2.4-2 bit_4.0.4 ##  htmlwidgets_1.5.4 httr_1.4.2 RColorBrewer_1.1-3 ##  ellipsis_0.3.2 ica_1.0-2 pkgconfig_2.0.3 ##  farver_2.1.0 sass_0.4.1 uwot_0.1.11 ##  deldir_1.0-6 utf8_1.2.2 tidyselect_1.1.2 ##  labeling_0.4.2 rlang_1.0.2 reshape2_1.4.4 ##  later_1.3.0 munsell_0.5.0 tools_4.1.3 ##  cachem_1.0.6 cli_3.3.0 generics_0.1.2 ##  ggridges_0.5.3 evaluate_0.15 stringr_1.4.0 ##  fastmap_1.1.0 yaml_2.3.5 ragg_1.2.2 ##  goftest_1.2-3 knitr_1.38 bit64_4.0.5 ##  fs_1.5.2 fitdistrplus_1.1-8 purrr_0.3.4 ##  RANN_2.6.1 pbapply_1.5-0 nlme_3.1-157 ##  mime_0.12 formatR_1.12 ggrastr_1.0.1 ##  hdf5r_1.3.5 compiler_4.1.3 rstudioapi_0.13 ##  beeswarm_0.4.0 plotly_4.10.0 curl_4.3.2 ##  png_0.1-7 spatstat.utils_2.3-0 tibble_3.1.6 ##  bslib_0.3.1 stringi_1.7.6 highr_0.9 ##  desc_1.4.1 RSpectra_0.16-1 rgeos_0.5-9 ##  lattice_0.20-45 Matrix_1.4-1 vctrs_0.4.1 ##  pillar_1.7.0 lifecycle_1.0.1 spatstat.geom_2.4-0 ##  lmtest_0.9-40 jquerylib_0.1.4 RcppAnnoy_0.0.19 ##  data.table_1.14.2 cowplot_1.1.1 irlba_2.3.5 ##  httpuv_1.6.5 patchwork_1.1.1 R6_2.5.1 ##  promises_184.108.40.206 KernSmooth_2.23-20 gridExtra_2.3 ##  vipor_0.4.5 parallelly_1.31.1 codetools_0.2-18 ##  MASS_7.3-56 assertthat_0.2.1 rprojroot_2.0.3 ##  withr_2.5.0 sctransform_0.3.3 mgcv_1.8-40 ##  parallel_4.1.3 grid_4.1.3 rpart_4.1.16 ##  tidyr_1.2.0 rmarkdown_2.14 Cairo_1.5-15 ##  Rtsne_0.16 shiny_1.7.1 ggbeeswarm_0.6.0