Perform dataset integration using a precomputed AnchorSet
.
IntegrateData( anchorset, new.assay.name = "integrated", normalization.method = c("LogNormalize", "SCT"), features = NULL, features.to.integrate = NULL, dims = 1:30, k.weight = 100, weight.reduction = NULL, sd.weight = 1, sample.tree = NULL, preserve.order = FALSE, eps = 0, verbose = TRUE )
anchorset  An 

new.assay.name  Name for the new assay containing the integrated data 
normalization.method  Name of normalization method used: LogNormalize or SCT 
features  Vector of features to use when computing the PCA to determine the weights. Only set if you want a different set from those used in the anchor finding process 
features.to.integrate  Vector of features to integrate. By default, will use the features used in anchor finding. 
dims  Number of dimensions to use in the anchor weighting procedure 
k.weight  Number of neighbors to consider when weighting anchors 
weight.reduction  Dimension reduction to use when calculating anchor weights. This can be one of:
Note that, if specified, the requested dimension reduction will only be used for calculating anchor weights in the first merge between reference and query, as the merged object will subsequently contain more cells than was in query, and weights will need to be calculated for all cells in the object. 
sd.weight  Controls the bandwidth of the Gaussian kernel for weighting 
sample.tree  Specify the order of integration. Order of integration
should be encoded in a matrix, where each row represents one of the pairwise
integration steps. Negative numbers specify a dataset, positive numbers
specify the integration results from a given row (the format of the merge
matrix included in the [,1] [,2] [1,] 2 3 [2,] 1 1 Which would cause dataset 2 and 3 to be integrated first, then the resulting object integrated with dataset 1. If NULL, the sample tree will be computed automatically. 
preserve.order  Do not reorder objects based on size for each pairwise integration. 
eps  Error bound on the neighbor finding algorithm (from

verbose  Print progress bars and output 
Returns a Seurat
object with a new integrated
Assay
. If normalization.method = "LogNormalize"
, the
integrated data is returned to the data
slot and can be treated as
lognormalized, corrected data. If normalization.method = "SCT"
, the
integrated data is returned to the scale.data
slot and can be treated
as centered, corrected Pearson residuals.
The main steps of this procedure are outlined below. For a more detailed description of the methodology, please see Stuart, Butler, et al Cell 2019. doi: 10.1016/j.cell.2019.05.031 ; doi: 10.1101/460147
For pairwise integration:
Construct a weights matrix that defines the association between each
query cell and each anchor. These weights are computed as 1  the distance
between the query cell and the anchor divided by the distance of the query
cell to the k.weight
th anchor multiplied by the anchor score
computed in FindIntegrationAnchors
. We then apply a Gaussian
kernel width a bandwidth defined by sd.weight
and normalize across
all k.weight
anchors.
Compute the anchor integration matrix as the difference between the two expression matrices for every pair of anchor cells
Compute the transformation matrix as the product of the integration matrix and the weights matrix.
Subtract the transformation matrix from the original expression matrix.
For multiple dataset integration, we perform iterative pairwise integration.
To determine the order of integration (if not specified via
sample.tree
), we
Define a distance between datasets as the total number of cells in the smaller dataset divided by the total number of anchors between the two datasets.
Compute all pairwise distances between datasets
Cluster this distance matrix to determine a guide tree
Stuart T, Butler A, et al. Comprehensive Integration of SingleCell Data. Cell. 2019;177:18881902 doi: 10.1016/j.cell.2019.05.031
if (FALSE) { # to install the SeuratData package see https://github.com/satijalab/seuratdata library(SeuratData) data("panc8") # panc8 is a merged Seurat object containing 8 separate pancreas datasets # split the object by dataset pancreas.list < SplitObject(panc8, split.by = "tech") # perform standard preprocessing on each object for (i in 1:length(pancreas.list)) { pancreas.list[[i]] < NormalizeData(pancreas.list[[i]], verbose = FALSE) pancreas.list[[i]] < FindVariableFeatures( pancreas.list[[i]], selection.method = "vst", nfeatures = 2000, verbose = FALSE ) } # find anchors anchors < FindIntegrationAnchors(object.list = pancreas.list) # integrate data integrated < IntegrateData(anchorset = anchors) }