This vignette highlights some example workflows for performing differential expression in Seurat. For demonstration purposes, we will be using the 2,700 PBMC object that is available via the SeuratData package).
library(Seurat)
library(SeuratData)
pbmc <- LoadData("pbmc3k", type = "pbmc3k.final")
The bulk of Seurat’s differential expression features can be accessed through the FindMarkers()
function. As a default, Seurat performs differential expression based on the non-parametric Wilcoxon rank sum test. This replaces the previous default test (‘bimod’). To test for differential expression between two specific groups of cells, specify the ident.1
and ident.2
parameters.
# list options for groups to perform differential expression on
levels(pbmc)
## [1] "Naive CD4 T" "Memory CD4 T" "CD14+ Mono" "B" "CD8 T"
## [6] "FCGR3A+ Mono" "NK" "DC" "Platelet"
# Find differentially expressed features between CD14+ and FCGR3A+ Monocytes
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono")
# view results
head(monocyte.de.markers)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## FCGR3A 1.193617e-101 -3.776553 0.131 0.975 1.636926e-97
## LYZ 8.134552e-75 2.614275 1.000 0.988 1.115572e-70
## RHOC 4.479768e-68 -2.325013 0.162 0.864 6.143554e-64
## S100A8 7.471811e-65 3.766437 0.975 0.500 1.024684e-60
## S100A9 1.318422e-64 3.299060 0.996 0.870 1.808084e-60
## IFITM2 4.821669e-64 -2.085807 0.677 1.000 6.612437e-60
The results data frame has the following columns :
If the ident.2
parameter is omitted or set to NULL, FindMarkers()
will test for differentially expressed features between the group specified by ident.1
and all other cells.
# Find differentially expressed features between CD14+ Monocytes and all other cells, only
# search for positive markers
monocyte.de.markers <- FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = NULL, only.pos = TRUE)
# view results
head(monocyte.de.markers)
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## S100A9 0.000000e+00 5.570063 0.996 0.215 0.000000e+00
## S100A8 0.000000e+00 5.477394 0.975 0.121 0.000000e+00
## FCN1 0.000000e+00 3.394219 0.952 0.151 0.000000e+00
## LGALS2 0.000000e+00 3.800484 0.908 0.059 0.000000e+00
## CD14 2.856582e-294 2.815626 0.667 0.028 3.917516e-290
## TYROBP 3.190467e-284 3.046798 0.994 0.265 4.375406e-280
To increase the speed of marker discovery, particularly for large datasets, Seurat allows for pre-filtering of features or cells. For example, features that are very infrequently detected in either group of cells, or features that are expressed at similar average levels, are unlikely to be differentially expressed. Example use cases of the min.pct
, logfc.threshold
, min.diff.pct
, and max.cells.per.ident
parameters are demonstrated below.
# Pre-filter features that are detected at <50% frequency in either CD14+ Monocytes or FCGR3A+
# Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.pct = 0.5))
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## FCGR3A 1.193617e-101 -3.776553 0.131 0.975 1.636926e-97
## LYZ 8.134552e-75 2.614275 1.000 0.988 1.115572e-70
## RHOC 4.479768e-68 -2.325013 0.162 0.864 6.143554e-64
## S100A8 7.471811e-65 3.766437 0.975 0.500 1.024684e-60
## S100A9 1.318422e-64 3.299060 0.996 0.870 1.808084e-60
## IFITM2 4.821669e-64 -2.085807 0.677 1.000 6.612437e-60
# Pre-filter features that have less than a two-fold change between the average expression of
# CD14+ Monocytes vs FCGR3A+ Monocytes
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", logfc.threshold = log(2)))
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## FCGR3A 1.193617e-101 -3.776553 0.131 0.975 1.636926e-97
## LYZ 8.134552e-75 2.614275 1.000 0.988 1.115572e-70
## RHOC 4.479768e-68 -2.325013 0.162 0.864 6.143554e-64
## S100A8 7.471811e-65 3.766437 0.975 0.500 1.024684e-60
## S100A9 1.318422e-64 3.299060 0.996 0.870 1.808084e-60
## IFITM2 4.821669e-64 -2.085807 0.677 1.000 6.612437e-60
# Pre-filter features whose detection percentages across the two groups are similar (within
# 0.25)
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", min.diff.pct = 0.25))
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## FCGR3A 1.193617e-101 -3.776553 0.131 0.975 1.636926e-97
## RHOC 4.479768e-68 -2.325013 0.162 0.864 6.143554e-64
## S100A8 7.471811e-65 3.766437 0.975 0.500 1.024684e-60
## IFITM2 4.821669e-64 -2.085807 0.677 1.000 6.612437e-60
## LGALS2 1.034540e-57 2.956704 0.908 0.265 1.418768e-53
## CDKN1C 2.886353e-48 -1.453845 0.029 0.506 3.958345e-44
# Increasing min.pct, logfc.threshold, and min.diff.pct, will increase the speed of DE
# testing, but could also miss features that are prefiltered
# Subsample each group to a maximum of 200 cells. Can be very useful for large clusters, or
# computationally-intensive DE tests
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", max.cells.per.ident = 200))
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## FCGR3A 4.725441e-61 -3.776553 0.131 0.975 6.480470e-57
## LYZ 6.442891e-56 2.614275 1.000 0.988 8.835781e-52
## S100A8 8.983226e-49 3.766437 0.975 0.500 1.231960e-44
## S100A9 1.812278e-47 3.299060 0.996 0.870 2.485358e-43
## IFITM2 1.185202e-45 -2.085807 0.677 1.000 1.625386e-41
## RPS19 1.685374e-44 -1.091150 0.990 1.000 2.311321e-40
The following differential expression tests are currently supported:
For MAST and DESeq2, please ensure that these packages are installed separately in order to use them as part of Seurat. Once installed, use the test.use
parameter can be used to specify which DE test to use.
# Test for DE features using the MAST package
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "MAST"))
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## LYZ 7.653066e-145 2.614275 1.000 0.988 1.049541e-140
## FCGR3A 2.897172e-119 -3.776553 0.131 0.975 3.973182e-115
## S100A9 2.123928e-95 3.299060 0.996 0.870 2.912755e-91
## S100A8 3.279521e-92 3.766437 0.975 0.500 4.497535e-88
## IFITM2 5.591175e-87 -2.085807 0.677 1.000 7.667737e-83
## LGALS2 1.132854e-75 2.956704 0.908 0.265 1.553596e-71
# Test for DE features using the DESeq2 package. Throws an error if DESeq2 has not already
# been installed Note that the DESeq2 workflows can be computationally intensive for large
# datasets, but are incompatible with some feature pre-filtering options We therefore suggest
# initially limiting the number of cells used for testing
head(FindMarkers(pbmc, ident.1 = "CD14+ Mono", ident.2 = "FCGR3A+ Mono", test.use = "DESeq2", max.cells.per.ident = 50))
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## S100A9 1.887262e-58 2.538360 0.996 0.870 2.588191e-54
## LYZ 4.198394e-52 1.987962 1.000 0.988 5.757678e-48
## S100A8 5.747352e-49 2.784248 0.975 0.500 7.881918e-45
## FCGR3A 1.315842e-35 -2.949992 0.131 0.975 1.804546e-31
## RPS19 1.514561e-33 -1.614892 0.990 1.000 2.077068e-29
## IFITM2 1.227042e-26 -2.212583 0.677 1.000 1.682765e-22
We thank the authors of the MAST and DESeq2 packages for their kind assistance and advice. We also point users to the following study by Charlotte Soneson and Mark Robinson, which performs careful and extensive evaluation of methods for single cell differential expression testing.Session Info
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
## [3] Biobase_2.58.0 GenomicRanges_1.50.2
## [5] GenomeInfoDb_1.35.16 IRanges_2.32.0
## [7] S4Vectors_0.36.2 BiocGenerics_0.44.0
## [9] MatrixGenerics_1.10.0 matrixStats_0.63.0
## [11] thp1.eccite.SeuratData_3.1.5 stxBrain.SeuratData_0.1.1
## [13] ssHippo.SeuratData_3.1.4 pbmcsca.SeuratData_3.0.0
## [15] pbmcMultiome.SeuratData_0.1.2 pbmc3k.SeuratData_3.1.4
## [17] panc8.SeuratData_3.0.2 ifnb.SeuratData_3.1.0
## [19] hcabm40k.SeuratData_3.0.0 bmcite.SeuratData_0.3.0
## [21] SeuratData_0.2.2 SeuratObject_4.1.3
## [23] Seurat_4.3.0
##
## loaded via a namespace (and not attached):
## [1] utf8_1.2.3 spatstat.explore_3.0-6 reticulate_1.28
## [4] tidyselect_1.2.0 RSQLite_2.3.0 AnnotationDbi_1.60.0
## [7] htmlwidgets_1.6.1 grid_4.2.0 BiocParallel_1.32.5
## [10] Rtsne_0.16 munsell_0.5.0 codetools_0.2-18
## [13] ragg_1.2.5 ica_1.0-3 future_1.31.0
## [16] miniUI_0.1.1.1 spatstat.random_3.1-3 colorspace_2.1-0
## [19] progressr_0.13.0 knitr_1.42 ROCR_1.0-11
## [22] tensor_1.5 listenv_0.9.0 GenomeInfoDbData_1.2.9
## [25] polyclip_1.10-4 bit64_4.0.5 rprojroot_2.0.3
## [28] parallelly_1.34.0 vctrs_0.5.2 generics_0.1.3
## [31] xfun_0.37 R6_2.5.1 locfit_1.5-9.7
## [34] bitops_1.0-7 spatstat.utils_3.0-1 cachem_1.0.7
## [37] DelayedArray_0.24.0 promises_1.2.0.1 scales_1.2.1
## [40] gtable_0.3.1 globals_0.16.2 goftest_1.2-3
## [43] rlang_1.0.6 systemfonts_1.0.4 splines_4.2.0
## [46] lazyeval_0.2.2 spatstat.geom_3.0-6 yaml_2.3.7
## [49] reshape2_1.4.4 abind_1.4-5 httpuv_1.6.9
## [52] tools_4.2.0 ggplot2_3.4.1 ellipsis_0.3.2
## [55] jquerylib_0.1.4 RColorBrewer_1.1-3 ggridges_0.5.4
## [58] Rcpp_1.0.10 plyr_1.8.8 progress_1.2.2
## [61] zlibbioc_1.44.0 purrr_1.0.1 RCurl_1.98-1.12
## [64] prettyunits_1.1.1 deldir_1.0-6 pbapply_1.7-0
## [67] cowplot_1.1.1 zoo_1.8-11 ggrepel_0.9.3
## [70] cluster_2.1.3 fs_1.6.1 magrittr_2.0.3
## [73] data.table_1.14.8 scattermore_0.8 lmtest_0.9-40
## [76] RANN_2.6.1 fitdistrplus_1.1-8 hms_1.1.2
## [79] patchwork_1.1.2 mime_0.12 evaluate_0.20
## [82] xtable_1.8-4 XML_3.99-0.13 gridExtra_2.3
## [85] compiler_4.2.0 tibble_3.1.8 KernSmooth_2.23-20
## [88] crayon_1.5.2 htmltools_0.5.4 later_1.3.0
## [91] tidyr_1.3.0 geneplotter_1.76.0 DBI_1.1.3
## [94] formatR_1.14 MASS_7.3-56 rappdirs_0.3.3
## [97] MAST_1.24.1 Matrix_1.5-3 cli_3.6.0
## [100] parallel_4.2.0 igraph_1.4.1 pkgconfig_2.0.3
## [103] pkgdown_2.0.7 sp_1.6-0 plotly_4.10.1
## [106] spatstat.sparse_3.0-0 annotate_1.76.0 bslib_0.4.2
## [109] XVector_0.38.0 stringr_1.5.0 digest_0.6.31
## [112] sctransform_0.3.5 RcppAnnoy_0.0.20 Biostrings_2.66.0
## [115] spatstat.data_3.0-0 rmarkdown_2.20 leiden_0.4.3
## [118] uwot_0.1.14 shiny_1.7.4 lifecycle_1.0.3
## [121] nlme_3.1-157 jsonlite_1.8.4 desc_1.4.2
## [124] viridisLite_0.4.1 limma_3.54.1 fansi_1.0.4
## [127] pillar_1.8.1 lattice_0.20-45 KEGGREST_1.38.0
## [130] fastmap_1.1.1 httr_1.4.5 survival_3.3-1
## [133] glue_1.6.2 png_0.1-8 bit_4.0.5
## [136] stringi_1.7.12 sass_0.4.5 blob_1.2.3
## [139] textshaping_0.3.6 DESeq2_1.38.3 memoise_2.0.1
## [142] dplyr_1.1.0 irlba_2.3.5.1 future.apply_1.10.0