Weighted Nearest Neighbor Analysis
Compiled: October 20, 2020
BMNC - RNA & ADT
This vignette introduces the weighted nearest neighbor (WNN) workflow for the analysis of multimodal single-cell datasets. The workflow consists of three steps
- Independent preprocessing and dimensional reduction of each modality individually
- Learning cell-specific modality 'weights', and constructing a WNN graph that integrates the modalities
- Downstream analysis (i.e. visualization, clustering, etc.) of the WNN graph
We use the CITE-seq dataset from (Stuart*, Butler* et al, Cell 2019), which consists of 30,672 scRNA-seq profiles measured alongside a panel of 25 antibodies. The object contains two assays, RNA and antibody-derived tags (ADT).
To run this vignette please install Seurat v4, available as a beta release on our github page.
remotes::install_github("satijalab/seurat", ref = "release/4.0.0")
library(Seurat) library(SeuratData) library(cowplot) library(dplyr)
InstallData("bmcite") bm <- LoadData(ds = "bmcite")
We first perform pre-processing and dimensional reduction on both assays independently. We use standard normalization, but you can also use SCTransform or any alternative method.
DefaultAssay(bm) <- 'RNA' bm <- NormalizeData(bm) %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA() DefaultAssay(bm) <- 'ADT' # we will use all ADT features for dimensional reduction # we set a dimensional reduction name to avoid overwriting the VariableFeatures(bm) <- rownames(bm[["ADT"]]) bm <- NormalizeData(bm, normalization.method = 'CLR', margin = 2) %>% ScaleData() %>% RunPCA(reduction.name = 'apca')
For each cell, we calculate its closest neighbors in the dataset based on a weighted combination of RNA and protein similarities. The cell-specific modality weights and multimodal neighbors are calculated in a single function, which takes ~2 minutes to run on this dataset. We specify the dimensionality of each modality (similar to specifying the number of PCs to include in scRNA-seq clustering), but you can vary these settings to see that small changes have minimal effect on the overall results.
# Identify multimodal neighbors. These will be stored in the neighbors slot, # and can be accessed using bm[['weighted.nn']] # The WNN graph can be accessed at bm[["wknn"]], # and the SNN graph used for clustering at bm[["wsnn"]] # Cell-specific modality weights can be accessed at bm$RNA.weight bm <- FindMultiModalNeighbors( bm, reduction.list = list("pca", "apca"), dims.list = list(1:30, 1:18), modality.weight.name = "RNA.weight" )
We can now use these results for downstream analysis, such as visualization and clustering. For example, we can create a UMAP visualization of the data based on a weighted combination of RNA and protein data We can also perform graph-based clustering and visualize these results on the UMAP, alongside a set of cell annotations.
bm <- RunUMAP(bm, nn.name = "weighted.nn", reduction.name = "wnn.umap", reduction.key = "wnnUMAP_") bm <- FindClusters(bm, graph.name = "wsnn", algorithm = 3, resolution = 2, verbose = FALSE)
p1 <- DimPlot(bm, reduction = 'wnn.umap', label = TRUE, repel = TRUE, label.size = 2.5) + NoLegend() p2 <- DimPlot(bm, reduction = 'wnn.umap', group.by = 'celltype.l2', label = TRUE, repel = TRUE, label.size = 2.5) + NoLegend() p1 + p2