Demultiplexing with hashtag oligos (HTOs)
Developed in collaboration with the Technology Innovation Group at NYGC, Cell Hashing uses oligo-tagged antibodies against ubuquitously expressed surface proteins to place a “sample barcode” on each single cell, enabling different samples to be multiplexed together and run in a single experiment. For more information, please refer to our preprint: https://www.biorxiv.org/content/early/2017/12/21/237693
This vignette will give a brief demonstration on how to work with data produced with Cell Hashing in Seurat. Applied to two datasets, we can successfully demultiplex cells to their the original sample-of-origin, and identify cross-sample doublets.
- We perform a k-medoid clustering on the normalized HTO values, which initially separates cells into K(# of samples)+1 clusters.
- We calculate a ‘negative’ distribution for HTO. For each HTO, we use the cluster with the lowest average value as the negative group.
- For each HTO, we fit a negative binomial distribution to the negative cluster. We use the 0.99 quantile of this distribution as a threshold.
- Based on these thresholds, each cell is classified as positive or negative for each HTO.
- Cells that are positive for more than one HTOs are annotated as doublets.
8-HTO dataset from human PBMCs
- Data represent peripheral blood mononuclear cells (PBMCs) from eight different donors.
- Cells from each donor are uniquely labeled, using CD45 as a hashing antibody.
- Samples were subsequently pooled, and run on a single lane of the the 10X Chromium v2 system.
- You can download the count matrices for RNA and HTO here, or the FASTQ files from GEO
Read in data
setwd("~/Downloads/HTODemuxFiles/") # Load in the UMI matrix pbmc_umi_sparse <- readRDS("pbmc_umi_mtx.rds") # For generating a hashtag count matrix from fastq files, please refer to https://github.com/Hoohm/CITE-seq-Count. # Load in the HTO count matrix pbmc_hto <- readRDS("pbmc_hto_mtx.rds") # Select cell barcodes detected by both RNA and HTO # In the example datasets we have already filtered the cells for you, but perform this step for clarity. joint_bcs <- intersect(colnames(pbmc_umi_sparse),colnames(pbmc_hto)) # Subset RNA and HTO counts by joint cell barcodes pbmc_umi_sparse <- pbmc_umi_sparse[,joint_bcs] pbmc_hto <- as.matrix(pbmc_hto[,joint_bcs]) # Confirm that the HTO have the correct names print (rownames(pbmc_hto))
##  "HTO_A" "HTO_B" "HTO_C" "HTO_D" "HTO_E" "HTO_F" "HTO_G" "HTO_H"
Setup Seurat object and add in the HTO data
# Setup Seurat object pbmc_hashtag <- CreateSeuratObject(raw.data = pbmc_umi_sparse) # Normalize RNA data with log normalization pbmc_hashtag <- NormalizeData(pbmc_hashtag,display.progress = FALSE) # Find and scale variable genes pbmc_hashtag <- FindVariableGenes(pbmc_hashtag,do.plot = F,display.progress = FALSE) pbmc_hashtag <- ScaleData(pbmc_hashtag,genes.use = firstname.lastname@example.org,display.progress = FALSE)
Adding HTO data as an independent assay
You can read more about working with multi-modal data here
# Add HTO data as a new assay independent from RNA pbmc_hashtag <- SetAssayData(pbmc_hashtag,assay.type = "HTO",slot = "raw.data",new.data = pbmc_hto) # Normalize HTO data, here we use centered log-ratio (CLR) transformation pbmc_hashtag <- NormalizeData(pbmc_hashtag,assay.type = "HTO",normalization.method = "genesCLR",display.progress = FALSE)
Demultiplex cells based on HTO enrichment
Here we use the Seurat function HTODemux() to assign single cells back to their sample origins.
# If you have a very large dataset we suggest using k_function = "clara". This is a k-medoid clustering function for large applications # You can also play with additional parameters (see documentation for HTODemux()) to adjust the threshold for classification # Here we are using the default settings pbmc_hashtag <- HTODemux(pbmc_hashtag,assay.type = "HTO",positive_quantile = 0.99,print.output = FALSE)
Visualize demultiplexing results
Output from running HTODemux() is saved in the object metadata. We can visualize how many cells are classified as singlets, doublets and negative/ambiguous cells.
# Global classification results print (table(email@example.com$hto_classification_global))
## ## Doublet Negative Singlet ## 2598 346 13972
Visualize enrichment for selected HTOs with ridge plots
# Group cells based on the max HTO signal pbmc_hashtag <- SetAllIdent(pbmc_hashtag,id = "hash_maxID") RidgePlot(pbmc_hashtag,features.plot = rownames(GetAssayData(pbmc_hashtag,assay.type = "HTO"))[1:2],nCol = 2)
Visualize pairs of HTO signals to confirm mutual exclusivity in singlets
GenePlot(pbmc_hashtag,"HTO_A","HTO_B",use.raw=FALSE,cex.use = 0.6,col.use = "black")