## How many cells do we need to sample so that we see at least n cells of each type?

- This depends on the number of cell type present and the diversity, i.e. the entropy. When entropy is very low, few or one cell type dominates the population and all other types are present at low fractions.
- In the default example below we assume that there are 10 rare cell types, each one present at a fraction of 2% of the total population. If we want to be 95% confident that our sample contains at least 5 cells from each of those cell types, we need to sample at least 619 cells in total.

### Underlying assumptions

- We assume the worst case scenario - one cell type dominates the population.
- For every type the number of cells in the population is much larger than the minimum number of cells desired in the sample.
- For a given type the probability of seeing at least n cells in a sample of size k follows the cumulative distribution function of a negative binomial NBcdf(k; n, p), with p being the relative abundance.
- For m cell types with the same parameter p the overall probability of seeing each type at least n times is NBcdf(k; n, p)^m. The results below should be treated as a lower bound, since in reality the number of cells in the population is finite and the random draws are not independent per cell type.

Use the sliders or text boxes below to change parameters.

Assumed number of cell types | Minimum fraction (of rarest cell type) | Minimum desired cells per type |

Calculating results

There was a problem with your input.

Update 7/19/2019: Fixed a bug that led to a slight underestimation of the number of cells needed (< 1% in most cases). Thanks to Alexander Davis (Navin Lab, The University of Texas MD Anderson Cancer Center) for pointing out the problem and suggesting a fix.

This website was created by Christoph Hafemeister in Rahul Satija's lab at the New York Genome Center. Technologies used: plotly, jStat, jQuery, jQuery UI

For questions or comments email chafemeister@nygenome.org