This vignette demonstrates how to install and run the Pan-human Azimuth R Cloud API. The API takes a Seurat object and returns hierarchical cell type predictions, confidence scores, and a low-dimensional visualization.

Setup the Environment

Install the Pan-human Azimuth R API from Github

devtools::install_github("satijalab/AzimuthAPI")

Run ANNotate

For this vignette, we will demonstrate annotating a dataset of human bone marrow mononuclear (BMNC) cells that we published as part of Stuart, Butler et al Cell 2019. This dataset is available through SeuratData.

library(Seurat)
library(SeuratData)
library(AzimuthAPI)
library(dplyr)

bmcite <- InstallData("bmcite")

bmcite <- LoadData("bmcite")

The CloudAzimuth function runs Pan-human Azimuth cell type predictions on a Seurat object via a cloud-based API deployed on AWS, and returns results stored in the object’s cell-level metadata.

bmcite <- CloudAzimuth(bmcite)

Pan-human Azimuth returns predictions in multiple formats, as well as softmax probability scores to estimate model confidence:

full_hierarchical_labels: The full predicted label for each cell, where ‘|’ delimits each hierarchical level
final_level_labels: The most granular level of the full hierarchical label
final_level_softmax_prob: The model’s predicted probability for each cell’s assigned final_level_label, ranging from 0 to 1

Examine mapping scores

We can view a histogram of softmax probabilities, which reflect the model confidence associated with each cell’s annotation, and reveal that scores for this dataset are generally quite high. We often remove cells with low scores (i.e. scores < 0.5 or 0.6), and do so here

hist(bmcite$final_level_softmax_prob, breaks = 30, main = "Histogram of Softmax Probabilities",
    xlab = "Softmax Probability", col = "skyblue", border = "white")

# remove cells with low scores
bmcite_qc <- subset(bmcite, final_level_softmax_prob > 0.5)

Visualize predictions

You can compute a 2d-embedding of the encoding layer generated by the Pan-human Azimuth neural network, which is stored in the azimuth_embed 128-dimensional reduction. You can use this to generate a 2d-visualization of the dataset, and to visualize predictions.

# Use azimuth_embed as input Output a UMAP stored in azimuth_umap
bmcite_qc <- RunUMAP(bmcite_qc, dims = 1:128, reduction = "azimuth_embed", reduction.name = "azimuth_umap")

Visualize predictions

The full_hierarchical_label for each cell provides the model’s classification at each level of granularity, with different levels separated by the ‘|’ character.

p2 <- DimPlot(bmcite_qc, group.by = "full_hierarchical_labels", label.size = 1.5, label = T, reduction = "azimuth_umap",
    repel = TRUE) + NoLegend()
p2

To avoid the long length of the full hierarchical label, we also output the final level of granularity

p3 <- DimPlot(bmcite_qc, group.by = "final_level_labels", label.size = 1.5, label = T, reduction = "azimuth_umap") +
    NoLegend()
p4 <- FeaturePlot(bmcite_qc, features = "final_level_softmax_prob", reduction = "azimuth_umap")
p3

p4

We also postprocess our predictions to provide labels at three consistent levels of granularity for easy handling, marking any cell with an invalid full hierarchical label (based on full_consistent_hierarchy) as False.

azimuth_broad: Corresponds to level_zero_labels (i.e. Immune cell)
azimuth_medium: Medium level of granularity (i.e. T cell)
azimuth_fine: High level of granularity (i.e. Treg cell)

These categories provide a consistent level of granularity for each cell, but may differ from the final_level_label, either by forcing the model to predict further along the cell type hierarchy than its intial prediction, or by rolling back its prediction to a lower level of granularity.

p5 <- DimPlot(bmcite_qc, group.by = "azimuth_medium", label.size = 3, label = T, reduction = "azimuth_umap") +
    NoLegend()
p5

p6 <- DimPlot(bmcite_qc, group.by = "azimuth_fine", label.size = 3, label = T, reduction = "azimuth_umap") +
    NoLegend()
p6

To remove the number of labels displayed, you can filter labels with less than a certain number of cells using PrepLabel. Here we filter labels with <20 cells per label. This can be useful to filter outliers, especially as Pan-Human Azimuth does not perform smoothing of single-cell labels by cluster. Therefore a single outlier annotation for one cell will still display the outlier label on a visualization, and the PrepLabel function can help with this.

bmcite_qc <- PrepLabel(bmcite_qc, "azimuth_fine", "azimuth_fine_filtered", cutoff = 20)
p7 <- DimPlot(bmcite_qc, group.by = "azimuth_fine_filtered", label.size = 3, label = T, reduction = "azimuth_umap") +
    NoLegend()
p7

Visualize differentially expressed features

The make_azimuth_QC_heatmaps function allows you to easily explore the quality of predicted labels by creating expression heatmaps by predicted cell type, with optional parameters for improved visualization:

final_name: Name of metadata columns to group cells by (default is azimuth_fine)
min.final.group: Minimum number of cells under a cell type to be displayed
max.ids.per.plot: Number of cell type labels displayed per plot
reorder: Flag to indicate whether to reorder cell types by transcriptional similarity
cells.order: Cell names to specify order of cells (i.e. by softmax probability)
save_folder_path: Save plots as PNG files under specified folder path

Plots are saved by azimuth_broad categories by default, with the exception of immune cell types grouped separately by lymphoid or myeloid/erythroid subpopulations.

plots <- make_azimuth_QC_heatmaps(bmcite_qc)
print(length(plots))

## [1] 3

p8 <- plots[["Immune_Lymphoid cell_1"]]
print(p8)

p9 <- plots[["Immune_Myeloid cell_1"]]
print(p9)

Pan-human Azimuth R API Vignette

Compiled: April 23, 2025

Setup the Environment

Run ANNotate

Examine mapping scores

Visualize predictions

Visualize predictions

Visualize differentially expressed features