Transfer categorical or continuous data across singlecell datasets. For transferring categorical information, pass a vector from the reference dataset (e.g. refdata = reference$celltype
). For transferring continuous information, pass a matrix from the reference dataset (e.g. refdata = GetAssayData(reference[['RNA']])
).
TransferData( anchorset, refdata, reference = NULL, query = NULL, weight.reduction = "pcaproject", l2.norm = FALSE, dims = NULL, k.weight = 50, sd.weight = 1, eps = 0, n.trees = 50, verbose = TRUE, slot = "data", prediction.assay = FALSE, store.weights = TRUE )
anchorset  An 

refdata  Data to transfer. This can be specified in one of two ways:

reference  Reference object from which to pull data to transfer 
query  Query object into which the data will be transferred. 
weight.reduction  Dimensional reduction to use for the weighting anchors. Options are:

l2.norm  Perform L2 normalization on the cell embeddings after dimensional reduction 
dims  Set of dimensions to use in the anchor weighting procedure 
k.weight  Number of neighbors to consider when weighting anchors 
sd.weight  Controls the bandwidth of the Gaussian kernel for weighting 
eps  Error bound on the neighbor finding algorithm (from 
n.trees  More trees gives higher precision when using annoy approximate nearest neighbor search 
verbose  Print progress bars and output 
slot  Slot to store the imputed data. Must be either "data" (default) or "counts" 
prediction.assay  Return an 
store.weights  Optionally store the weights matrix used for predictions in the returned query object. 
If query
is not provided, for the categorical data in refdata
, returns a data.frame with label predictions. If refdata
is a matrix, returns an Assay object where the imputed data has been stored in the provided slot.
If query
is provided, a modified query object is returned. For the categorical data in refdata, prediction scores are stored as Assays (prediction.score.NAME) and two additional metadata fields: predicted.NAME and predicted.NAME.score which contain the class prediction and the score for that predicted class. For continuous data, an Assay called NAME is returned. NAME here corresponds to the name of the element in the refdata list.
The main steps of this procedure are outlined below. For a more detailed description of the methodology, please see Stuart, Butler, et al Cell 2019. doi: 10.1016/j.cell.2019.05.031 ; doi: 10.1101/460147
For both transferring discrete labels and also feature imputation, we first compute the weights matrix.
Construct a weights matrix that defines the association between each query cell and each anchor. These weights are computed as 1  the distance between the query cell and the anchor divided by the distance of the query cell to the k.weight
th anchor multiplied by the anchor score computed in FindIntegrationAnchors
. We then apply a Gaussian kernel width a bandwidth defined by sd.weight
and normalize across all k.weight
anchors.
The main difference between label transfer (classification) and feature imputation is what gets multiplied by the weights matrix. For label transfer, we perform the following steps:
Create a binary classification matrix, the rows corresponding to each possible class and the columns corresponding to the anchors. If the reference cell in the anchor pair is a member of a certain class, that matrix entry is filled with a 1, otherwise 0.
Multiply this classification matrix by the transpose of weights matrix to compute a prediction score for each class for each cell in the query dataset.
For feature imputation, we perform the following step:
Multiply the expression matrix for the reference anchor cells by the weights matrix. This returns a predicted expression matrix for the specified features for each cell in the query dataset.
Stuart T, Butler A, et al. Comprehensive Integration of SingleCell Data. Cell. 2019;177:18881902 doi: 10.1016/j.cell.2019.05.031
if (FALSE) { # to install the SeuratData package see https://github.com/satijalab/seuratdata library(SeuratData) data("pbmc3k") # for demonstration, split the object into reference and query pbmc.reference < pbmc3k[, 1:1350] pbmc.query < pbmc3k[, 1351:2700] # perform standard preprocessing on each object pbmc.reference < NormalizeData(pbmc.reference) pbmc.reference < FindVariableFeatures(pbmc.reference) pbmc.reference < ScaleData(pbmc.reference) pbmc.query < NormalizeData(pbmc.query) pbmc.query < FindVariableFeatures(pbmc.query) pbmc.query < ScaleData(pbmc.query) # find anchors anchors < FindTransferAnchors(reference = pbmc.reference, query = pbmc.query) # transfer labels predictions < TransferData(anchorset = anchors, refdata = pbmc.reference$seurat_annotations) pbmc.query < AddMetaData(object = pbmc.query, metadata = predictions) }