Perform sctransform-based normalization

Perform a variance‐stabilizing transformation on UMI counts using sctransform::vst (https://github.com/satijalab/sctransform). This replaces the NormalizeData → FindVariableFeatures → ScaleData workflow by fitting a regularized negative binomial model per gene and returning:

SCTransform(object, ...)

# Default S3 method
SCTransform(
  object,
  cell.attr,
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  latent.data = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = umi)/30), sqrt(x = ncol(x = umi)/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

# S3 method for class 'Assay'
SCTransform(
  object,
  cell.attr,
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  latent.data = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = object)/30), sqrt(x = ncol(x = object)/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

# S3 method for class 'Seurat'
SCTransform(
  object,
  assay = "RNA",
  new.assay.name = "SCT",
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = object[[assay]])/30), sqrt(x = ncol(x =
    object[[assay]])/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

# S3 method for class 'IterableMatrix'
SCTransform(
  object,
  cell.attr,
  reference.SCT.model = NULL,
  do.correct.umi = TRUE,
  ncells = 5000,
  residual.features = NULL,
  variable.features.n = 3000,
  variable.features.rv.th = 1.3,
  vars.to.regress = NULL,
  latent.data = NULL,
  do.scale = FALSE,
  do.center = TRUE,
  clip.range = c(-sqrt(x = ncol(x = object)/30), sqrt(x = ncol(x = object)/30)),
  vst.flavor = "v2",
  conserve.memory = FALSE,
  return.only.var.genes = TRUE,
  seed.use = 1448145,
  verbose = TRUE,
  ...
)

Arguments

object: A Seurat object or UMI count matrix.
...: Additional arguments passed to sctransform::vst.
cell.attr: Optional metadata frame (cells × attributes).
reference.SCT.model: Pre‐fitted SCT model (supports only log_umi as latent variable). If provided, computes residuals via that model. When residual.features is NULL, uses the model’s top variable.features.n; otherwise, sets the assay’s variable features to residual.features.
do.correct.umi: Logical; if TRUE (default), stores corrected UMIs in counts.
ncells: Integer; number of cells to subsample when fitting NB regression (default: 5000).
residual.features: Character vector of genes to compute residuals for. Default NULL (all genes). If set, these become the assay’s variable features.
variable.features.n: Integer; when residual.features is NULL, select this many top features by residual variance (default: 3000).
variable.features.rv.th: Numeric; if variable.features.n is NULL, select features exceeding this residual‐variance threshold (default: 1.3).
vars.to.regress: Character vector of metadata columns (e.g. percent.mito) to regress out in a second, non‐regularized model.
latent.data: Numeric matrix (cells × latent covariates) to regress out.
do.scale: Logical; if TRUE, scale residuals to unit variance (default: FALSE).
do.center: Logical; if TRUE, center residuals to mean zero (default: TRUE).
clip.range: Numeric vector of length 2; range to clip residuals (default c(-sqrt(n/30), sqrt(n/30)), with n = number of cells).
vst.flavor: Character; if "v2", uses method = "glmGamPoi_offset", n_cells = 2000, and exclude_poisson = TRUE to fit \(\theta\) and intercept only.
conserve.memory: Logical; if TRUE, never builds the full residual matrix (slower but memory‐efficient; forces return.only.var.genes=TRUE; default: FALSE).
return.only.var.genes: Logical; if TRUE (default), scale.data is subset to variable features only.
seed.use: Integer; random seed for reproducibility (default: 1448145). Set to NULL to skip setting a seed.
verbose: Logical; whether to print progress messages (default: TRUE).
assay: Name of assay to pull the count data from; default is 'RNA'
new.assay.name: Name for the new assay containing the normalized data; default is 'SCT'

Value

A Seurat object with a new SCT assay containing: counts (corrected UMIs), data (log1p counts), and scale.data (Pearson residuals), plus misc for intermediate vst outputs.

Details

- A new assay (default name “SCT”), in which: - counts: depth‐corrected UMI counts (as if each cell had uniform sequencing depth; controlled by do.correct.umi). - data: log1p of corrected counts. - scale.data: Pearson residuals from the fitted NB model (optionally centered and/or scaled). - misc: intermediate outputs from sctransform::vst.

When multiple counts layers exist (e.g. after split()), each layer is modeled independently. A consensus variable‐feature set is then defined by ranking features by how often they’re called “variable” across different layers (ties broken by median rank).

By default, sctransform::vst will drop features expressed in fewer than five cells. In the multi-layer case, this can lead to consenus variable-features being excluded from the output's scale.data when a feature is "variable" across many layers but sparsely expressed in at least one.

Perform sctransform-based normalization

Arguments

Value

Details

See also