Find variable features — FindVariableFeatures • Seurat

Identifies features that are outliers on a 'mean variability plot'.

FindVariableFeatures(object, ...)

# S3 method for class 'V3Matrix'
FindVariableFeatures(
  object,
  selection.method = "vst",
  loess.span = 0.3,
  clip.max = "auto",
  mean.function = FastExpMean,
  dispersion.function = FastLogVMR,
  num.bin = 20,
  binning.method = "equal_width",
  verbose = TRUE,
  ...
)

# S3 method for class 'Assay'
FindVariableFeatures(
  object,
  selection.method = "vst",
  loess.span = 0.3,
  clip.max = "auto",
  mean.function = FastExpMean,
  dispersion.function = FastLogVMR,
  num.bin = 20,
  binning.method = "equal_width",
  nfeatures = 2000,
  mean.cutoff = c(0.1, 8),
  dispersion.cutoff = c(1, Inf),
  verbose = TRUE,
  ...
)

# S3 method for class 'SCTAssay'
FindVariableFeatures(object, nfeatures = 2000, ...)

# S3 method for class 'Seurat'
FindVariableFeatures(
  object,
  assay = NULL,
  selection.method = "vst",
  loess.span = 0.3,
  clip.max = "auto",
  mean.function = FastExpMean,
  dispersion.function = FastLogVMR,
  num.bin = 20,
  binning.method = "equal_width",
  nfeatures = 2000,
  mean.cutoff = c(0.1, 8),
  dispersion.cutoff = c(1, Inf),
  verbose = TRUE,
  ...
)

Arguments

object

An object

...

Arguments passed to other methods

selection.method

How to choose top variable features. Choose one of :

“vst”: First, fits a line to the relationship of log(variance) and log(mean) using local polynomial regression (loess). Then standardizes the feature values using the observed mean and expected variance (given by the fitted line). Feature variance is then calculated on the standardized values after clipping to a maximum (see clip.max parameter).
“mean.var.plot” (mvp): First, uses a function to calculate average expression (mean.function) and dispersion (dispersion.function) for each feature. Next, divides features into num.bin (default 20) bins based on their average expression, and calculates z-scores for dispersion within each bin. The purpose of this is to identify variable features while controlling for the strong relationship between variability and average expression
“dispersion” (disp): selects the genes with the highest dispersion values

loess.span

(vst method) Loess span parameter used when fitting the variance-mean relationship

clip.max

(vst method) After standardization values larger than clip.max will be set to clip.max; default is 'auto' which sets this value to the square root of the number of cells

mean.function

Function to compute x-axis value (average expression). Default is to take the mean of the detected (i.e. non-zero) values

dispersion.function

Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values

num.bin

Total number of bins to use in the scaled analysis (default is 20)

binning.method

Specifies how the bins should be computed. Available methods are:

“equal_width”: each bin is of equal width along the x-axis (default)
“equal_frequency”: each bin contains an equal number of features (can increase statistical power to detect overdispersed features at high expression values, at the cost of reduced resolution along the x-axis)

verbose

show progress bar for calculations

nfeatures

Number of features to select as top variable features; only used when selection.method is set to 'dispersion' or 'vst'

mean.cutoff

A two-length numeric vector with low- and high-cutoffs for feature means

dispersion.cutoff

A two-length numeric vector with low- and high-cutoffs for feature dispersions

assay

Assay to use

Details

For the mean.var.plot method: Exact parameter settings may vary empirically from dataset to dataset, and based on visual inspection of the plot. Setting the y.cutoff parameter to 2 identifies features that are more than two standard deviations away from the average dispersion within a bin. The default X-axis function is the mean expression level, and for Y-axis it is the log(Variance/mean). All mean/variance calculations are not performed in log-space, but the results are reported in log-space - see relevant functions for exact details.