Developed in collaboration with the Technology Innovation Group at NYGC, Cell Hashing uses oligo-tagged antibodies against ubuquitously expressed surface proteins to place a “sample barcode” on each single cell, enabling different samples to be multiplexed together and run in a single experiment. For more information, please refer to our preprint:

This vignette will give a brief demonstration on how to work with data produced with Cell Hashing in Seurat. Applied to two datasets, we can successfully demultiplex cells to their the original sample-of-origin, and identify cross-sample doublets.

The demultiplexing function HTODemux() implements the following procedure:

8-HTO dataset from human PBMCs

Dataset description:
  • Data represent peripheral blood mononuclear cells (PBMCs) from eight different donors.
  • Cells from each donor are uniquely labeled, using CD45 as a hashing antibody.
  • Samples were subsequently pooled, and run on a single lane of the the 10X Chromium v2 system.
  • You can download the count matrices for RNA and HTO here, or the FASTQ files from GEO

Basic setup

Load packages


Read in data

# Load in the UMI matrix
pbmc_umi_sparse <- readRDS("pbmc_umi_mtx.rds")

# For generating a hashtag count matrix from fastq files, please refer to
# Load in the HTO count matrix
pbmc_hto <- readRDS("pbmc_hto_mtx.rds")

# Select cell barcodes detected by both RNA and HTO
# In the example datasets we have already filtered the cells for you, but perform this step for clarity.
joint_bcs <- intersect(colnames(pbmc_umi_sparse),colnames(pbmc_hto))

# Subset RNA and HTO counts by joint cell barcodes
pbmc_umi_sparse <- pbmc_umi_sparse[,joint_bcs]
pbmc_hto <- as.matrix(pbmc_hto[,joint_bcs])

# Confirm that the HTO have the correct names
print (rownames(pbmc_hto))
## [1] "HTO_A" "HTO_B" "HTO_C" "HTO_D" "HTO_E" "HTO_F" "HTO_G" "HTO_H"

Setup Seurat object and add in the HTO data

# Setup Seurat object
pbmc_hashtag <- CreateSeuratObject( = pbmc_umi_sparse)

# Normalize RNA data with log normalization
pbmc_hashtag <- NormalizeData(pbmc_hashtag,display.progress = FALSE)
# Find and scale variable genes
pbmc_hashtag <- FindVariableGenes(pbmc_hashtag,do.plot = F,display.progress = FALSE)
pbmc_hashtag <- ScaleData(pbmc_hashtag,genes.use = pbmc_hashtag@var.genes,display.progress = FALSE)

Adding HTO data as an independent assay

You can read more about working with multi-modal data here

# Add HTO data as a new assay independent from RNA
pbmc_hashtag <- SetAssayData(pbmc_hashtag,assay.type = "HTO",slot = "", = pbmc_hto)
# Normalize HTO data, here we use centered log-ratio (CLR) transformation
pbmc_hashtag <- NormalizeData(pbmc_hashtag,assay.type = "HTO",normalization.method = "genesCLR",display.progress = FALSE)

Demultiplex cells based on HTO enrichment

Here we use the Seurat function HTODemux() to assign single cells back to their sample origins.

# If you have a very large dataset we suggest using k_function = "clara". This is a k-medoid clustering function for large applications
# You can also play with additional parameters (see documentation for HTODemux()) to adjust the threshold for classification
# Here we are using the default settings
pbmc_hashtag <- HTODemux(pbmc_hashtag,assay.type = "HTO",positive_quantile =  0.99,print.output = FALSE)

Visualize demultiplexing results

Output from running HTODemux() is saved in the object metadata. We can visualize how many cells are classified as singlets, doublets and negative/ambiguous cells.

# Global classification results
print (table($hto_classification_global))
##  Doublet Negative  Singlet 
##     2598      346    13972

Visualize enrichment for selected HTOs with ridge plots

# Group cells based on the max HTO signal
pbmc_hashtag <- SetAllIdent(pbmc_hashtag,id = "hash_maxID")
RidgePlot(pbmc_hashtag,features.plot = rownames(GetAssayData(pbmc_hashtag,assay.type = "HTO"))[1:2],nCol = 2)

Visualize pairs of HTO signals to confirm mutual exclusivity in singlets

GenePlot(pbmc_hashtag,"HTO_A","HTO_B",use.raw=FALSE,cex.use = 0.6,col.use = "black")