This vignette demonstrates how to install and run the Pan-human Azimuth R Cloud API. The API takes a Seurat object and returns hierarchical cell type predictions, confidence scores, and a low-dimensional visualization.
Install the Pan-human Azimuth R API from Github
devtools::install_github("satijalab/AzimuthAPI")
For this vignette, we will demonstrate annotating a dataset of human bone marrow mononuclear (BMNC) cells that we published as part of Stuart, Butler et al Cell 2019. This dataset is available through SeuratData.
library(Seurat)
library(SeuratData)
library(AzimuthAPI)
library(dplyr)
bmcite <- InstallData("bmcite")
bmcite <- LoadData("bmcite")
The CloudAzimuth
function runs Pan-human Azimuth cell
type predictions on a Seurat object via a cloud-based API deployed on
AWS, and returns results stored in the object’s cell-level metadata.
bmcite <- CloudAzimuth(bmcite)
Pan-human Azimuth returns predictions in multiple formats, as well as softmax probability scores to estimate model confidence:
full_hierarchical_labels
: The full predicted label for
each cell, where ‘|’ delimits each hierarchical levelfinal_level_labels
: The most granular level of the full
hierarchical labelfinal_level_softmax_prob
: The model’s predicted
probability for each cell’s assigned final_level_label
,
ranging from 0 to 1We can view a histogram of softmax probabilities, which reflect the model confidence associated with each cell’s annotation, and reveal that scores for this dataset are generally quite high. We often remove cells with low scores (i.e. scores < 0.5 or 0.6), and do so here
hist(bmcite$final_level_softmax_prob, breaks = 30, main = "Histogram of Softmax Probabilities",
xlab = "Softmax Probability", col = "skyblue", border = "white")
# remove cells with low scores
bmcite_qc <- subset(bmcite, final_level_softmax_prob > 0.5)
You can compute a 2d-embedding of the encoding layer generated by the
Pan-human Azimuth neural network, which is stored in the
azimuth_embed
128-dimensional reduction. You can use this
to generate a 2d-visualization of the dataset, and to visualize
predictions.
# Use azimuth_embed as input Output a UMAP stored in azimuth_umap
bmcite_qc <- RunUMAP(bmcite_qc, dims = 1:128, reduction = "azimuth_embed", reduction.name = "azimuth_umap")
The full_hierarchical_label
for each cell provides the
model’s classification at each level of granularity, with different
levels separated by the ‘|’ character.
p2 <- DimPlot(bmcite_qc, group.by = "full_hierarchical_labels", label.size = 1.5, label = T, reduction = "azimuth_umap",
repel = TRUE) + NoLegend()
p2
To avoid the long length of the full hierarchical label, we also output
the final level of granularity
p3 <- DimPlot(bmcite_qc, group.by = "final_level_labels", label.size = 1.5, label = T, reduction = "azimuth_umap") +
NoLegend()
p4 <- FeaturePlot(bmcite_qc, features = "final_level_softmax_prob", reduction = "azimuth_umap")
p3
p4
We also postprocess our predictions to provide labels at three
consistent levels of granularity for easy handling, marking any cell
with an invalid full hierarchical label (based on
full_consistent_hierarchy
) as False
.
azimuth_broad
: Corresponds to level_zero_labels
(i.e. Immune cell)azimuth_medium
: Medium level of granularity (i.e. T
cell)azimuth_fine
: High level of granularity (i.e. Treg
cell)These categories provide a consistent level of granularity for each
cell, but may differ from the final_level_label
, either by
forcing the model to predict further along the cell type hierarchy than
its intial prediction, or by rolling back its prediction to a lower
level of granularity.
p5 <- DimPlot(bmcite_qc, group.by = "azimuth_medium", label.size = 3, label = T, reduction = "azimuth_umap") +
NoLegend()
p5
p6 <- DimPlot(bmcite_qc, group.by = "azimuth_fine", label.size = 3, label = T, reduction = "azimuth_umap") +
NoLegend()
p6
To remove the number of labels displayed, you can filter labels with
less than a certain number of cells using PrepLabel
. Here
we filter labels with <20 cells per label. This can be useful to
filter outliers, especially as Pan-Human Azimuth does not perform
smoothing of single-cell labels by cluster. Therefore a single outlier
annotation for one cell will still display the outlier label on a
visualization, and the PrepLabel
function can help with
this.
bmcite_qc <- PrepLabel(bmcite_qc, "azimuth_fine", "azimuth_fine_filtered", cutoff = 20)
p7 <- DimPlot(bmcite_qc, group.by = "azimuth_fine_filtered", label.size = 3, label = T, reduction = "azimuth_umap") +
NoLegend()
p7
The make_azimuth_QC_heatmaps
function allows you to
easily explore the quality of predicted labels by creating expression
heatmaps by predicted cell type, with optional parameters for improved
visualization:
final_name
: Name of metadata columns to group cells by
(default is azimuth_fine
)min.final.group
: Minimum number of cells under a cell
type to be displayedmax.ids.per.plot
: Number of cell type labels displayed
per plotreorder
: Flag to indicate whether to reorder cell types
by transcriptional similaritycells.order
: Cell names to specify order of cells
(i.e. by softmax probability)save_folder_path
: Save plots as PNG files under
specified folder pathPlots are saved by azimuth_broad
categories by default,
with the exception of immune cell types grouped separately by lymphoid
or myeloid/erythroid subpopulations.
plots <- make_azimuth_QC_heatmaps(bmcite_qc)
print(length(plots))
## [1] 3
p8 <- plots[["Immune_Lymphoid cell_1"]]
print(p8)
p9 <- plots[["Immune_Myeloid cell_1"]]
print(p9)