This function calculates an adaptive inflection point ("knee") of the barcode distribution for each sample group. This is useful for determining a threshold for removing low-quality samples.

CalculateBarcodeInflections(
  object,
  barcode.column = "nCount_RNA",
  group.column = "orig.ident",
  threshold.low = NULL,
  threshold.high = NULL
)

Arguments

object

Seurat object

barcode.column

Column to use as proxy for barcodes ("nCount_RNA" by default)

group.column

Column to group by ("orig.ident" by default)

threshold.low

Ignore barcodes of rank below this threshold in inflection calculation

threshold.high

Ignore barcodes of rank above thisf threshold in inflection calculation

Value

Returns Seurat object with a new list in the `tools` slot, `CalculateBarcodeInflections` with values:

* `barcode_distribution` - contains the full barcode distribution across the entire dataset * `inflection_points` - the calculated inflection points within the thresholds * `threshold_values` - the provided (or default) threshold values to search within for inflections * `cells_pass` - the cells that pass the inflection point calculation

Details

The function operates by calculating the slope of the barcode number vs. rank distribution, and then finding the point at which the distribution changes most steeply (the "knee"). Of note, this calculation often must be restricted as to the range at which it performs, so `threshold` parameters are provided to restrict the range of the calculation based on the rank of the barcodes. [BarcodeInflectionsPlot()] is provided as a convenience function to visualize and test different thresholds and thus provide more sensical end results.

See [BarcodeInflectionsPlot()] to visualize the calculated inflection points and [SubsetByBarcodeInflections()] to subsequently subset the Seurat object.

Author

Robert A. Amezquita, robert.amezquita@fredhutch.org

Examples

data("pbmc_small")
CalculateBarcodeInflections(pbmc_small, group.column = 'groups')
#> An object of class Seurat 
#> 230 features across 80 samples within 1 assay 
#> Active assay: RNA (230 features, 20 variable features)
#>  3 layers present: counts, data, scale.data
#>  2 dimensional reductions calculated: pca, tsne