Developed in collaboration with the Technology Innovation Group at NYGC, Cell Hashing uses oligo-tagged antibodies against ubuquitously expressed surface proteins to place a “sample barcode” on each single cell, enabling different samples to be multiplexed together and run in a single experiment. For more information, please refer to this paper.

This vignette will give a brief demonstration on how to work with data produced with Cell Hashing in Seurat. Applied to two datasets, we can successfully demultiplex cells to their the original sample-of-origin, and identify cross-sample doublets.

The demultiplexing function HTODemux() implements the following procedure:

8-HTO dataset from human PBMCs

Dataset description:
  • Data represent peripheral blood mononuclear cells (PBMCs) from eight different donors.
  • Cells from each donor are uniquely labeled, using CD45 as a hashing antibody.
  • Samples were subsequently pooled, and run on a single lane of the the 10X Chromium v2 system.
  • You can download the count matrices for RNA and HTO here, or the FASTQ files from GEO

Basic setup

Load packages

library(Seurat)

Read in data

# Load in the UMI matrix
pbmc.umis <- readRDS("../data/pbmc_umi_mtx.rds")

# For generating a hashtag count matrix from FASTQ files, please refer to
# https://github.com/Hoohm/CITE-seq-Count.  Load in the HTO count matrix
pbmc.htos <- readRDS("../data/pbmc_hto_mtx.rds")

# Select cell barcodes detected by both RNA and HTO In the example datasets we have already
# filtered the cells for you, but perform this step for clarity.
joint.bcs <- intersect(colnames(pbmc.umis), colnames(pbmc.htos))

# Subset RNA and HTO counts by joint cell barcodes
pbmc.umis <- pbmc.umis[, joint.bcs]
pbmc.htos <- as.matrix(pbmc.htos[, joint.bcs])

# Confirm that the HTO have the correct names
rownames(pbmc.htos)
## [1] "HTO_A" "HTO_B" "HTO_C" "HTO_D" "HTO_E" "HTO_F" "HTO_G" "HTO_H"

Setup Seurat object and add in the HTO data

# Setup Seurat object
pbmc.hashtag <- CreateSeuratObject(counts = pbmc.umis)

# Normalize RNA data with log normalization
pbmc.hashtag <- NormalizeData(pbmc.hashtag)
# Find and scale variable features
pbmc.hashtag <- FindVariableFeatures(pbmc.hashtag, selection.method = "mean.var.plot")
pbmc.hashtag <- ScaleData(pbmc.hashtag, features = VariableFeatures(pbmc.hashtag))

Adding HTO data as an independent assay

You can read more about working with multi-modal data here

# Add HTO data as a new assay independent from RNA
pbmc.hashtag[["HTO"]] <- CreateAssayObject(counts = pbmc.htos)
# Normalize HTO data, here we use centered log-ratio (CLR) transformation
pbmc.hashtag <- NormalizeData(pbmc.hashtag, assay = "HTO", normalization.method = "CLR")

Demultiplex cells based on HTO enrichment

Here we use the Seurat function HTODemux() to assign single cells back to their sample origins.

# If you have a very large dataset we suggest using k_function = 'clara'. This is a k-medoid
# clustering function for large applications You can also play with additional parameters (see
# documentation for HTODemux()) to adjust the threshold for classification Here we are using the
# default settings
pbmc.hashtag <- HTODemux(pbmc.hashtag, assay = "HTO", positive.quantile = 0.99)

Visualize demultiplexing results

Output from running HTODemux() is saved in the object metadata. We can visualize how many cells are classified as singlets, doublets and negative/ambiguous cells.

# Global classification results
table(pbmc.hashtag$HTO_classification.global)
## 
##  Doublet Negative  Singlet 
##     2598      346    13972

Visualize enrichment for selected HTOs with ridge plots

# Group cells based on the max HTO signal
Idents(pbmc.hashtag) <- "HTO_maxID"
RidgePlot(pbmc.hashtag, assay = "HTO", features = rownames(pbmc.hashtag[["HTO"]])[1:2], ncol = 2)

Visualize pairs of HTO signals to confirm mutual exclusivity in singlets

FeatureScatter(pbmc.hashtag, feature1 = "hto_HTO-A", feature2 = "hto_HTO-B")

Compare number of UMIs for singlets, doublets and negative cells

Idents(pbmc.hashtag) <- "HTO_classification.global"
VlnPlot(pbmc.hashtag, features = "nCount_RNA", pt.size = 0.1, log = TRUE)

Generate a two dimensional tSNE embedding for HTOs.Here we are grouping cells by singlets and doublets for simplicity.

# First, we will remove negative cells from the object
pbmc.hashtag.subset <- subset(pbmc.hashtag, idents = "Negative", invert = TRUE)

# Calculate a distance matrix using HTO
hto.dist.mtx <- as.matrix(dist(t(GetAssayData(object = pbmc.hashtag.subset, assay = "HTO"))))

# Calculate tSNE embeddings with a distance matrix
pbmc.hashtag.subset <- RunTSNE(pbmc.hashtag.subset, distance.matrix = hto.dist.mtx, perplexity = 100)
DimPlot(pbmc.hashtag.subset)

# You can also visualize the more detailed classification result by running Idents(object) <-
# 'HTO_classification' beofre plotting. Here, you can see that each of the small clouds on the
# tSNE plot corresponds to one of the 28 possible doublet combinations.

Create an HTO heatmap, based on Figure 1C in the Cell Hashing paper.

# To increase the efficiency of plotting, you can subsample cells using the num.cells argument
HTOHeatmap(pbmc.hashtag, assay = "HTO", ncells = 5000)

Cluster and visualize cells using the usual scRNA-seq workflow, and examine for the potential presence of batch effects.

# Extract the singlets
pbmc.singlet <- subset(pbmc.hashtag, idents = "Singlet")

# Select the top 1000 most variable features
pbmc.singlet <- FindVariableFeatures(pbmc.singlet, selection.method = "mean.var.plot")

# Scaling RNA data, we only scale the variable features here for efficiency
pbmc.singlet <- ScaleData(pbmc.singlet, features = VariableFeatures(pbmc.singlet))

# Run PCA
pbmc.singlet <- RunPCA(pbmc.singlet, features = VariableFeatures(pbmc.singlet))
## PC_ 1 
## Positive:  LYZ, S100A9, S100A8, FOS, CST3, AIF1, TYROBP, LST1, RP11-1143G9.4, FCER1G 
##     LGALS1, FTL, FTH1, VCAN, LGALS2, CFD, MS4A6A, CSTA, CD14, CYBB 
##     HLA-DRA, CD36, RP11-290F20.3, GCA, CD74, RBP7, SGK1, JUND, G0S2, IFITM3 
## Negative:  CCL5, NKG7, GNLY, B2M, RPL3, KLRB1, PRF1, KLRD1, GZMB, KLRF1 
##     GZMM, FGFBP2, CCL4, CMC1, CD2, TRDC, RPL21, CD247, CD69, SPON2 
##     GZMK, CLIC3, KLRG1, TRGC2, HOPX, C12orf75, ZAP70, RPS4X, TRGC1, MATK 
## PC_ 2 
## Positive:  IGHM, RPS8, CD79A, RPL21, IGHD, MS4A1, LINC00926, HLA-DRA, CD74, TCL1A 
##     IGLC3, IGLC2, RPS4X, VPREB3, RPL3, RP11-693J15.5, IGJ, FCER2, MEF2C, AFF3 
##     BCL11A, AC096579.7, IGHA1, IGLL5, AL928768.3, PIM2, IGHG1, MOUSE-mt-Nd1, MOUSE-mt-Cytb, MOUSE-mt-Nd4 
## Negative:  GNLY, NKG7, CCL5, GZMB, KLRF1, PRF1, KLRD1, FGFBP2, SPON2, CLIC3 
##     B2M, TYROBP, CCL4, FCER1G, KLRB1, CMC1, TRDC, HOPX, MYOM2, LGALS1 
##     FCGR3A, GZMM, FCRL6, CD247, KLRC1, S1PR5, MATK, TRGC1, APMAP, TTC38 
## PC_ 3 
## Positive:  MOUSE-Rpl41, MOUSE-Actb, MOUSE-Rpl35, MOUSE-S100a6, MOUSE-mt-Cytb, MOUSE-Lgals1, MOUSE-mt-Nd4, MOUSE-Rps8, MOUSE-Malat1, MOUSE-mt-Nd1 
##     MOUSE-Rplp1, MOUSE-Fth1, MOUSE-Rpl37, MOUSE-Rpl37a, MOUSE-Rpl18a, MOUSE-Rps29, MOUSE-Rpl13, MOUSE-Rps2, MOUSE-Ftl1, MOUSE-Rps28 
##     MOUSE-Tmsb10, MOUSE-Rps27, MOUSE-S100a4, MOUSE-Rps18, MOUSE-Rps5, MOUSE-Rpl13a, MOUSE-Gm10260, MOUSE-Fau, MOUSE-Rpl3, MOUSE-Rplp0 
## Negative:  RPL21, RPS8, B2M, RPL3, RPS4X, CD74, HLA-DRA, IGHM, CD79A, LINC00926 
##     IGJ, IGHD, MS4A1, IGLC2, TCL1A, MEF2C, VPREB3, FTH1, IGLC3, FTL 
##     RP11-693J15.5, PPP3CA, RP11-138A9.2, AFF3, PIM2, FCER2, IGHA1, BCL11A, CD69, RNU1-59P 
## PC_ 4 
## Positive:  RPL21, RPS8, RPL3, RPS4X, CD8B, CD2, GZMK, PIM2, DUSP2, B2M 
##     RP11-138A9.2, LYAR, S100A8, ZAP70, AIF1, MOUSE-Rpl18a, RBP7, MOUSE-S100a6, VCAN, MOUSE-Rps23 
##     PRDM1, S100A9, MOUSE-Lgals1, STAT4, MOUSE-Rps16, MOUSE-Rps3a1, MOUSE-Rps5, MOUSE-Rps8, G0S2, MOUSE-Rps19 
## Negative:  CD74, IGHM, HLA-DRA, IGHD, CD79A, MS4A1, TCL1A, LINC00926, IGLC3, IGLC2 
##     MEF2C, VPREB3, GNLY, IGJ, GZMB, RP11-693J15.5, FCER2, KLRF1, NKG7, BCL11A 
##     CLIC3, PRF1, SPON2, FGFBP2, AFF3, KLRD1, AC096579.7, CCL4, MYOM2, PLEK 
## PC_ 5 
## Positive:  VCAN, S100A8, S100A9, FOS, RP11-1143G9.4, CD14, CD36, LYZ, MS4A6A, LGALS2 
##     RBP7, CCL5, GZMK, DUSP2, G0S2, JUND, SGK1, TRGC2, RNU1-59P, KLRG1 
##     RNVU1-19, ALOX5AP, LYAR, KLRB1, NKG7, CMC1, GCA, CD69, IGHM, TRGC1 
## Negative:  FCGR3A, RP11-290F20.3, IFITM3, LYPD2, LST1, FCER1G, AIF1, CFD, FTH1, CST3 
##     FTL, TYROBP, SOX4, RPL21, MYOM2, RPS4X, B2M, LGALS1, PLEK, MEF2C 
##     HLA-DRA, SPON2, CLIC3, RPL3, APOBEC3C, PTGDS, CD74, S100B, GZMB, MOUSE-Rps29
# We select the top 10 PCs for clustering and tSNE based on PCElbowPlot
pbmc.singlet <- FindNeighbors(pbmc.singlet, reduction = "pca", dims = 1:10)
pbmc.singlet <- FindClusters(pbmc.singlet, resolution = 0.6)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 13972
## Number of edges: 450324
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8983
## Number of communities: 18
## Elapsed time: 2 seconds
pbmc.singlet <- RunTSNE(pbmc.singlet, reduction = "pca", dims = 1:10)

# Projecting singlet identities on TSNE visualization
DimPlot(pbmc.singlet, group.by = "HTO_classification")

12-HTO dataset from four human cell lines

Dataset description:
  • Data represent single cells collected from four cell lines: HEK, K562, KG1 and THP1
  • Each cell line was further split into three samples (12 samples in total).
  • Each sample was labeled with a hashing antibody mixture (CD29 and CD45), pooled, and run on a single lane of 10X.
  • Based on this design, we should be able to detect doublets both across and within cell types
  • You can download the count matrices for RNA and HTO here, and we are currently uploading data to GEO

Create Seurat object, add HTO data and perform normalization

# Read in UMI count matrix for RNA
hto12.umis <- readRDS("../data/hto12_umi_mtx.rds")

# Read in HTO count matrix
hto12.htos <- readRDS("../data/hto12_hto_mtx.rds")

# Select cell barcodes detected in both RNA and HTO
cells.use <- intersect(rownames(hto12.htos), colnames(hto12.umis))

# Create Seurat object and add HTO data
hto12 <- CreateSeuratObject(counts = hto12.umis[, cells.use], min.features = 300)
hto12[["HTO"]] <- CreateAssayObject(counts = t(x = hto12.htos[colnames(hto12), 1:12]))

# Normalize data
hto12 <- NormalizeData(hto12)
hto12 <- NormalizeData(hto12, assay = "HTO", normalization.method = "CLR")

Demultiplex data

hto12 <- HTODemux(hto12, assay = "HTO", positive.quantile = 0.99)

Visualize demultiplexing results

Distribution of selected HTOs grouped by classification, displayed by ridge plots

RidgePlot(hto12, assay = "HTO", features = c("HEK-A", "K562-B", "KG1-A", "THP1-C"), ncol = 2)

Visualize HTO signals in a heatmap

HTOHeatmap(hto12, assay = "HTO")

Visualize RNA clustering

  • Below, we cluster the cells using our standard scRNA-seq workflow. As expected we see four major clusters, corresponding to the cell lines
  • In addition, we see small clusters in between, representing mixed transcriptomes that are correctly annotated as doublets.
  • We also see within-cell type doublets, that are (perhaps unsurprisingly) intermixed with singlets of the same cell type
  • # Remove the negative cells
    hto12 <- subset(hto12, idents = "Negative", invert = TRUE)
    
    # Run PCA on most variable features
    hto12 <- FindVariableFeatures(hto12, selection.method = "mean.var.plot")
    hto12 <- ScaleData(hto12, features = VariableFeatures(hto12))
    hto12 <- RunPCA(hto12)
    ## PC_ 1 
    ## Positive:  MDK, ID3, XIST, CKB, HES4, YDJC, HBG1, HIST1H2BK, HBG2, HSPA1A 
    ##     HMBS, NMU, GYPA, HBE1, HBZ, BLVRB, HBA1, HIST1H1C, MYL4, RFESD 
    ##     HBA2, RP11-462G2.1, PRAME, GYPB, SDF2L1, PHF19, HIST1H2BJ, HEMGN, HIST1H2AC, UROD 
    ## Negative:  GMFG, RPS4Y1, AIF1, ARHGDIB, TMSB4X, VAMP8, B2M, S100A4, CD74, IGFBP7 
    ##     LST1, SRGN, CLEC11A, NUCB2, EIF1AY, CAPG, HLA-DRA, HLA-DRB1, PSMB8, ALOX5AP 
    ##     LGALS1, CYP1B1, PRSS57, HPGDS, PLAC8, RAC2, CORO1A, PYCARD, CD37, NCF4 
    ## PC_ 2 
    ## Positive:  ITM2A, HLA-DRA, PRSS57, HPGDS, PTPRCAP, HLA-DPA1, HOPX, CD74, HLA-DPB1, DSCC1 
    ##     CD34, CYP1B1, SPINT2, C19orf77, CTSW, IL2RG, IFI16, RP11-354E11.2, RAB27B, SLC9A3R1 
    ##     HLA-DMA, CLEC2B, IQGAP2, UBE2L6, KIAA0125, SAMSN1, TSTD1, C12orf75, ICAM3, GATA2 
    ## Negative:  AZU1, ELANE, PRTN3, CHI3L1, CITED4, CAP1, PPT1, CFD, CTSG, LYST 
    ##     S100A6, CSTA, MS4A3, SLPI, MNDA, PPCS, NFYC, RNASE2, PPP1R27, SMAP2 
    ##     BMP8B, TREM2, APLP2, MS4A6A, IRF8, FCGRT, DHRS9, SLC44A1, AGT, ZMPSTE24 
    ## PC_ 3 
    ## Positive:  HBG1, HBG2, GYPA, HBE1, MYL4, RP11-462G2.1, PRAME, HBA1, HBZ, GTSF1 
    ##     GYPB, HBA2, BLVRB, HEMGN, HIST1H1C, RFESD, HIST1H2AC, HIST1H2BK, NMU, HIST1H2BJ 
    ##     RHAG, ALAS2, HBD, PHF19, PAGE1, SNCG, TNNI3, SLC25A37, JUNB, RP11-301G19.1 
    ## Negative:  NGFRAP1, S100A10, CDKN2A, VIM, UCHL1, CNN3, CALD1, ZNF503, DMKN, XIST 
    ##     PLS3, RP6-24A23.7, TSC22D3, ID4, SOX4, HOTAIRM1, TUBB2B, NEFL, UBE2C, ID3 
    ##     CSRP2, HSPA1A, PMAIP1, HES4, ID2, HSPA1B, HAND1, C12orf75, SLIT2, EIF4A1 
    ## PC_ 4 
    ## Positive:  HSD17B4, PLEK, HPGDS, HGF, PRSS57, ADCYAP1, CA2, CST7, MT-ND3, ALS2 
    ##     CSF2RB, LINC01088, DBI, MT-ND2, MT-RNR2, CALR, MT-CYB, DTWD2, DSCC1, IKZF2 
    ##     SNTB1, BACE2, AE000661.37, SLA, MT-CO1, CD84, RP11-317J10.2, RP11-643A5.2, RAB32, CLEC11A 
    ## Negative:  REG4, LTB, CYTIP, NRP2, ALDH1A1, DUSP6, S100B, SLC9A3R1, RAMP1, HOPX 
    ##     ABI3, VCAN, CYTL1, CYP1B1, GNG11, SQLE, SLC2A5, SAMSN1, PTPRC, CALCRL 
    ##     LAPTM5, HPGD, C10orf54, MTSS1, LY96, ANGPT2, GSN, C6orf141, C10orf128, TIMP1 
    ## PC_ 5 
    ## Positive:  CST3, TYROBP, S100A9, FCER1A, IFI6, IFI30, GRN, FCER1G, NPC2, S100A8 
    ##     APOE, NUPR1, ASAH1, SH3BGRL3, DBI, SPP1, HLA-C, ALOX5AP, CTSD, ANXA1 
    ##     GLIPR1, CTSS, AP1S2, B2M, TIMP1, S100A11, APOC1, LYZ, RNF130, RP11-354E11.2 
    ## Negative:  C6orf141, CTSG, HOPX, CLDN10, RASGRP2, REG4, MS4A3, NRP2, NUCB2, LTB 
    ##     SLC2A5, PRTN3, DNAJC15, MLC1, PPP1R27, MT-ND5, MYC, SPINK2, BMP8B, PTPRCAP 
    ##     GNG11, SATB1, SERPINB2, MTSS1, RAMP1, CYTIP, STOM, KIAA0125, GLYATL2, ALDH1A1
    hto12 <- RunTSNE(hto12, dims = 1:5, perplexity = 100)
    DimPlot(hto12) + NoLegend()