About Install Get Started Frequently Asked Questions Frequently Requested Vignettes Contact
Differential expression testing

Install development version of Seurat

Our updates to differential expression will appear in the CRAN version soon, but are currently available in the development version of Seurat. Please install that here

# Install the devtools package from Hadley Wickham
install.packages("devtools")
library(devtools)

install_github("satijalab/seurat", ref = "develop")

Load in the data

This vignette highlights some example workflows for performing differential expression in Seurat. For demonstration purposes, we will be using the 2,700 PBMC object that is created in the first guided tutorial. You can download the pre-computed object here.

library(Seurat)
load(file = "~/Projects/datasets/pbmc3k_final.Rda")

Perform default differential expression tests

The bulk of Seurat’s differential expression features can be accessed through the FindMarkers function. As a default, Seurat performs differential expression based on the non-parameteric Wilcoxon rank sum test. This replaces the previous default test (‘bimod’). To test for differential expression between two specific groups of cells, specify the ident.1 and ident.2 parameters.

# list options for groups to perform differential expression on
levels(pbmc@ident)
## [1] "B cells"           "CD4 T cells"       "CD8 T cells"      
## [4] "CD14+ Monocytes"   "Dendritic cells"   "FCGR3A+ Monocytes"
## [7] "Megakaryocytes"    "NK cells"
# Find differentially expressed genes between CD14+ and FCGR3A+ Monocytes
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = "FCGR3A+ Monocytes")

# view results
head(monocyte_de_genes)
##               p_val avg_logFC pct.1 pct.2    p_val_adj
## FCGR3A 1.955129e-96 -2.564467 0.134 0.962 2.681263e-92
## LYZ    3.454141e-71  1.743081 1.000 0.987 4.737009e-67
## RHOC   3.740947e-65 -1.599961 0.165 0.854 5.130335e-61
## S100A8 2.167846e-64  2.663478 0.973 0.500 2.972984e-60
## IFITM2 2.657528e-62 -1.427412 0.676 1.000 3.644533e-58
## S100A9 6.597364e-62  2.258291 0.996 0.873 9.047626e-58

The results data frame has the following columns :

  • p_val : p_val (unadjusted)
  • avg_logFC : log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group.
  • pct.1 : The percentage of cells where the gene is detected in the first group
  • pct.2 : The percentage of cells where the gene is detected in the second group
  • p_val_adj : Adjusted p-value, based on bonferroni correction using all genes in the dataset.

If the ident.2 parameter is omitted or set to NULL, FindMarkers will test for differentially expressed genes between the group specified by ident.1 and all other cells.

# Find differentially expressed genes between CD14+ Monocytes and all other
# cells, only search for positive markers
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = NULL, 
    only.pos = TRUE)

# view results
head(monocyte_de_genes)
##                p_val avg_logFC pct.1 pct.2    p_val_adj
## S100A9  0.000000e+00  3.827593 0.996 0.216  0.00000e+00
## S100A8  0.000000e+00  3.786535 0.973 0.123  0.00000e+00
## LGALS2  0.000000e+00  2.634722 0.908 0.060  0.00000e+00
## FCN1    0.000000e+00  2.369524 0.956 0.150  0.00000e+00
## CD14   8.129864e-290  1.949317 0.663 0.029 1.11493e-285
## TYROBP 1.197623e-282  2.106174 0.994 0.266 1.64242e-278

Prefilter genes or cells to increase the speed of DE testing

To increase the speed of marker discovery, particularly for large datasets, Seurat allows for pre-filtering of genes or cells. For example, genes that are very infrequently detected in either group of cells, or genes that are expressed at similar average levels, are unlikely to be differentially expressed. Example use cases of the min.pct, logfc.threshold, min.diff.pct, and max.cells.per.ident parameters are demonstrated below.

# Pre-filter genes that are detected at <50% frequency in either CD14+
# Monocytes or FCGR3A+ Monocytes
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = "FCGR3A+ Monocytes", 
    min.pct = 0.5)

# Pre-filter genes that have less than a two-fold change between the average
# expression of CD14+ Monocytes vs FCGR3A+ Monocytes
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = "FCGR3A+ Monocytes", 
    logfc.threshold = log(2))

# Pre-filter genes whose detection percentages across the two groups are
# similar (within 0.25)
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = "FCGR3A+ Monocytes", 
    min.diff.pct = 0.25)

# Increasing min.pct, logfc.threshold, and min.diff.pct, will increase the
# speed of DE testing, but could also miss genes that are prefiltered

# Subsample each group to a maximum of 200 cells. Can be very useful for
# large clusters, or computationally-intensive DE tests
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = "FCGR3A+ Monocytes", 
    max.cells.per.ident = 200)

Perform DE analysis using alternative tests

The following differential expression tests are currently supported:

  • “wilcox” : Wilcoxon rank sum test (default)
  • “bimod” : Likelihood-ratio test for single cell gene expression, (McDavid et al., Bioinformatics, 2013)
  • “roc” : Standard AUC classifier
  • “t” : Student’s t-test
  • “tobit” : Tobit-test for differential gene expression (Trapnell et al., Nature Biotech, 2014)
  • “poisson” : Likelihood ratio test assuming an underlying negative binomial distribution. Use only for UMI-based datasets
  • “negbinom” : Likelihood ratio test assuming an underlying negative binomial distribution. Use only for UMI-based datasets
  • “MAST” : GLM-framework that treates cellular detection rate as a covariate (Finak et al, Genome Biology, 2015) INSTALL
  • “DESeq2” : DE based on a model using the negative binomial distribution (Love et al, Genome Biology, 2014) INSTALL

For MAST and DESeq2 please ensure that these packages are installed separately in order to use them as part of Seurat. Once installed, use the test.use parameter can be used to specify which DE test to use.

# Test for DE genes using the MAST package
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = "FCGR3A+ Monocytes", 
    test.use = "MAST")

# Test for DE genes using the DESeq2 package. Throws an error if DESeq2 has
# not already been installed Note that the DESeq2 workflows can be
# computationally intensive for large datasets, but are incompatible with
# some gene pre-filtering options We therefore suggest initially limiting
# the number of cells used for testing
monocyte_de_genes <- FindMarkers(pbmc, ident.1 = "CD14+ Monocytes", ident.2 = "FCGR3A+ Monocytes", 
    test.use = "DESeq2", max.cells.per.ident = 50)

Acknowledgements

We thank the authors of the MAST and DESeq2 packages for their kind assistance and advice. We also point users to the following study by Charlotte Soneson and Mark Robinson, which performs careful and extensive evaluation of methods for single cell differential expression testing.