In this vignette, we will combine two 10X PBMC datasets: one containing 4K cells and one containing 8K cells. The datasets can be found here.

To start, we read in the data and create two Seurat objects.

library(Seurat)
pbmc4k.data <- Read10X(data.dir = "../data/pbmc4k/filtered_gene_bc_matrices/GRCh38/")
pbmc4k <- CreateSeuratObject(counts = pbmc4k.data, project = "PBMC4K")
pbmc4k
## An object of class Seurat 
## 33694 features across 4340 samples within 1 assay 
## Active assay: RNA (33694 features, 0 variable features)
pbmc8k.data <- Read10X(data.dir = "../data/pbmc8k/filtered_gene_bc_matrices/GRCh38/")
pbmc8k <- CreateSeuratObject(counts = pbmc8k.data, project = "PBMC8K")
pbmc8k
## An object of class Seurat 
## 33694 features across 8381 samples within 1 assay 
## Active assay: RNA (33694 features, 0 variable features)

Merging Two Seurat Objects

merge() merges the raw count matrices of two Seurat objects and creates a new Seurat object with the resulting combined raw count matrix. To easily tell which original object any particular cell came from, you can set the add.cell.ids parameter with an c(x, y) vector, which will prepend the given identifier to the beginning of each cell name. The original project ID will remain stored in object meta data under orig.ident

pbmc.combined <- merge(pbmc4k, y = pbmc8k, add.cell.ids = c("4K", "8K"), project = "PBMC12K")
pbmc.combined
## An object of class Seurat 
## 33694 features across 12721 samples within 1 assay 
## Active assay: RNA (33694 features, 0 variable features)
# notice the cell names now have an added identifier
head(colnames(pbmc.combined))
## [1] "4K_AAACCTGAGAAGGCCT-1" "4K_AAACCTGAGACAGACC-1" "4K_AAACCTGAGATAGTCA-1"
## [4] "4K_AAACCTGAGCGCCTCA-1" "4K_AAACCTGAGGCATGGT-1" "4K_AAACCTGCAAGGTTCT-1"
table(pbmc.combined$orig.ident)
## 
## PBMC4K PBMC8K 
##   4340   8381

Merging More Than Two Seurat Objects

To merge more than two Seurat objects, simply pass a vector of multiple Seurat objects to the y parameter for merge; we’ll demonstrate this using the 4K and 8K PBMC datasets as well as our previously computed Seurat object from the 2,700 PBMC tutorial (loaded via the SeuratData package).

library(SeuratData)
InstallData("pbmc3k")
pbmc3k <- LoadData("pbmc3k", type = "pbmc3k.final")
pbmc3k
## An object of class Seurat 
## 13714 features across 2638 samples within 1 assay 
## Active assay: RNA (13714 features, 2000 variable features)
##  2 dimensional reductions calculated: pca, umap
pbmc.big <- merge(pbmc3k, y = c(pbmc4k, pbmc8k), add.cell.ids = c("3K", "4K", "8K"), project = "PBMC15K")
pbmc.big
## An object of class Seurat 
## 34230 features across 15359 samples within 1 assay 
## Active assay: RNA (34230 features, 0 variable features)
head(colnames(pbmc.big))
## [1] "3K_AAACATACAACCAC" "3K_AAACATTGAGCTAC" "3K_AAACATTGATCAGC"
## [4] "3K_AAACCGTGCTTCCG" "3K_AAACCGTGTATGCG" "3K_AAACGCACTGGTAC"
tail(colnames(pbmc.big))
## [1] "8K_TTTGTCAGTTACCGAT-1" "8K_TTTGTCATCATGTCCC-1" "8K_TTTGTCATCCGATATG-1"
## [4] "8K_TTTGTCATCGTCTGAA-1" "8K_TTTGTCATCTCGAGTA-1" "8K_TTTGTCATCTGCTTGC-1"
unique(sapply(X = strsplit(colnames(pbmc.big), split = "_"), FUN = "[", 1))
## [1] "3K" "4K" "8K"
table(pbmc.big$orig.ident)
## 
## pbmc3k PBMC4K PBMC8K 
##   2638   4340   8381

Merge Based on Normalized Data

By default, merge() will combine the Seurat objects based on the raw count matrices, erasing any previously normalized and scaled data matrices. If you want to merge the normalized data matrices as well as the raw count matrices, simply pass merge.data = TRUE. This should be done if the same normalization approach was applied to all objects.

pbmc4k <- NormalizeData(pbmc4k)
pbmc8k <- NormalizeData(pbmc8k)
pbmc.normalized <- merge(pbmc4k, y = pbmc8k, add.cell.ids = c("4K", "8K"), project = "PBMC12K",
    merge.data = TRUE)
GetAssayData(pbmc.combined)[1:10, 1:15]
## 10 x 15 sparse Matrix of class "dgCMatrix"
##                                            
## RP11-34P13.3  . . . . . . . . . . . . . . .
## FAM138A       . . . . . . . . . . . . . . .
## OR4F5         . . . . . . . . . . . . . . .
## RP11-34P13.7  . . . . . . . . . . . . . . .
## RP11-34P13.8  . . . . . . . . . . . . . . .
## RP11-34P13.14 . . . . . . . . . . . . . . .
## RP11-34P13.9  . . . . . . . . . . . . . . .
## FO538757.3    . . . . . . . . . . . . . . .
## FO538757.2    . . . . . . . . . 1 . . . . .
## AP006222.2    . . . . . . . . . . . 1 . . .
GetAssayData(pbmc.normalized)[1:10, 1:15]
## 10 x 15 sparse Matrix of class "dgCMatrix"
##                                                           
## RP11-34P13.3  . . . . . . . . . .         . .        . . .
## FAM138A       . . . . . . . . . .         . .        . . .
## OR4F5         . . . . . . . . . .         . .        . . .
## RP11-34P13.7  . . . . . . . . . .         . .        . . .
## RP11-34P13.8  . . . . . . . . . .         . .        . . .
## RP11-34P13.14 . . . . . . . . . .         . .        . . .
## RP11-34P13.9  . . . . . . . . . .         . .        . . .
## FO538757.3    . . . . . . . . . .         . .        . . .
## FO538757.2    . . . . . . . . . 0.7721503 . .        . . .
## AP006222.2    . . . . . . . . . .         . 1.087928 . . .

Session Info

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] thp1.eccite.SeuratData_3.1.5  stxBrain.SeuratData_0.1.1    
##  [3] ssHippo.SeuratData_3.1.4      pbmcsca.SeuratData_3.0.0     
##  [5] pbmcMultiome.SeuratData_0.1.1 pbmc3k.SeuratData_3.1.4      
##  [7] panc8.SeuratData_3.0.2        ifnb.SeuratData_3.1.0        
##  [9] hcabm40k.SeuratData_3.0.0     bmcite.SeuratData_0.3.0      
## [11] SeuratData_0.2.1              SeuratObject_4.0.2           
## [13] Seurat_4.0.4                 
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.15            colorspace_2.0-2      deldir_0.2-10        
##   [4] ellipsis_0.3.2        ggridges_0.5.3        rprojroot_2.0.2      
##   [7] fs_1.5.0              spatstat.data_2.1-0   leiden_0.3.8         
##  [10] listenv_0.8.0         ggrepel_0.9.1         fansi_0.5.0          
##  [13] codetools_0.2-18      splines_4.1.0         cachem_1.0.5         
##  [16] knitr_1.33            polyclip_1.10-0       jsonlite_1.7.2       
##  [19] ica_1.0-2             cluster_2.1.2         png_0.1-7            
##  [22] uwot_0.1.10           shiny_1.6.0           sctransform_0.3.2    
##  [25] spatstat.sparse_2.0-0 compiler_4.1.0        httr_1.4.2           
##  [28] assertthat_0.2.1      Matrix_1.3-3          fastmap_1.1.0        
##  [31] lazyeval_0.2.2        cli_3.0.1             later_1.2.0          
##  [34] formatR_1.11          htmltools_0.5.1.1     tools_4.1.0          
##  [37] igraph_1.2.6          gtable_0.3.0          glue_1.4.2           
##  [40] RANN_2.6.1            reshape2_1.4.4        dplyr_1.0.7          
##  [43] rappdirs_0.3.3        Rcpp_1.0.7            scattermore_0.7      
##  [46] jquerylib_0.1.4       pkgdown_1.6.1         vctrs_0.3.8          
##  [49] nlme_3.1-152          lmtest_0.9-38         xfun_0.25            
##  [52] stringr_1.4.0         globals_0.14.0        mime_0.11            
##  [55] miniUI_0.1.1.1        lifecycle_1.0.0       irlba_2.3.3          
##  [58] goftest_1.2-2         future_1.21.0         MASS_7.3-54          
##  [61] zoo_1.8-9             scales_1.1.1          spatstat.core_2.1-2  
##  [64] ragg_1.1.3            promises_1.2.0.1      spatstat.utils_2.1-0 
##  [67] parallel_4.1.0        RColorBrewer_1.1-2    yaml_2.2.1           
##  [70] memoise_2.0.0         reticulate_1.20       pbapply_1.4-3        
##  [73] gridExtra_2.3         ggplot2_3.3.5         sass_0.4.0           
##  [76] rpart_4.1-15          stringi_1.7.3         desc_1.3.0           
##  [79] rlang_0.4.11          pkgconfig_2.0.3       systemfonts_1.0.2    
##  [82] matrixStats_0.60.0    evaluate_0.14         lattice_0.20-44      
##  [85] tensor_1.5            ROCR_1.0-11           purrr_0.3.4          
##  [88] patchwork_1.1.1       htmlwidgets_1.5.3     cowplot_1.1.1        
##  [91] tidyselect_1.1.1      parallelly_1.26.0     RcppAnnoy_0.0.18     
##  [94] plyr_1.8.6            magrittr_2.0.1        R6_2.5.0             
##  [97] generics_0.1.0        DBI_1.1.1             mgcv_1.8-35          
## [100] pillar_1.6.2          fitdistrplus_1.1-5    abind_1.4-5          
## [103] survival_3.2-11       tibble_3.1.3          future.apply_1.7.0   
## [106] crayon_1.4.1          KernSmooth_2.23-20    utf8_1.2.2           
## [109] spatstat.geom_2.1-0   plotly_4.9.4          rmarkdown_2.10       
## [112] grid_4.1.0            data.table_1.14.0     digest_0.6.27        
## [115] xtable_1.8-4          tidyr_1.1.3           httpuv_1.6.1         
## [118] textshaping_0.3.5     munsell_0.5.0         viridisLite_0.4.0    
## [121] bslib_0.2.5.1