What is Signac?

Signac is an extension of Seurat for the analysis of single-cell chromatin data (DNA-based single-cell assays). We have extended the Seurat object to include information about the genome sequence and genomic coordinates of sequenced fragments per cell, and include functions needed for the analysis of single-cell chromatin data.

How do I interact with the object?

Signac uses the Seurat object structure, and so all the Seurat commands can be used when analysing data with Signac. See the Seurat documentation for more information: https://satijalab.org/seurat/

How do I merge objects with Signac?

There are two important points to consider when merging Seurat objects containing single-cell chromatin data. First, if peak calling was performed separately on the different objects the peaks will not overlap perfectly, and non-overlapping peaks are treated as completely different features in Seurat. Second, we do not currently support the use of multiple fragment files per assay in Signac (support will be added in the future).

To deal with the first issue of finding corresponding peaks across objects, you can merge objects in a coordinate-aware way in Signac using the MergeWithRegions function. This will consider overlapping peaks as equivalent. Alternatively, you could create a unified peak set and quantify reads in each peak in each cell using the FeatureMatrix function.

To deal with the second issue of consolidating fragment files on-disk, you can follow this example code on the command line. You will need to have tabix and bgzip installed.

# decompress files
gzip -d frag_1.tsv.gz frag_2.tsv.gz

# merge files (avoids having to re-sort)
sort -m -k 1,1V -k2,2n frag_1.tsv frag_2.tsv > fragments.tsv

# block gzip compress the merged file
bgzip -@ 4 fragments.tsv # -@ 4 uses 4 threads

# index the bgzipped file
tabix --preset=bed fragments.tsv.gz

Once the fragment files for the merged object are consolidated, you can set the path to the merged fragment file in R:

fragment.path <- "path.to.new.file"
merged.obj <- SetFragments(object = merged.obj, file = fragment.path)

How should I decide on the number of dimensions to use?

Choosing the dimensionality is a general problem in single-cell analysis for which there is no simple solution. There has been discussion about this for scRNA-seq, and you can read our recommendations for scRNA-seq in the Seurat vignettes: https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html (see “Determine the ‘dimensionality’ of the dataset”).

Here are some general tips/suggestions that might help guide you in the choice for number of dimensions:

  • the number of dimensions needed will generally scale with the size and complexity of the dataset
  • you can try varying the number of dimensions used and observing how the resulting clusters or UMAP changes
  • it is usually better to choose values that are higher rather than too low
  • having a good understanding of the biology will help a lot in knowing whether the clusters make sense, or if the dimensionality might be too high/low

An annotation or genome sequence for my organism is not available on BioConductor, what do I do?

If you are studying an organism that does not have a BSgenome genome package or EnsDB annotation package available on BioConductor, you can still use your own GTF file or FASTA files with Signac.

To create your own BSgenome data package, see this vignette.

To use a GTF file, you can import it using rtracklayer, for example:

gtf <- rtracklayer::import('genes.gtf')
gene.coords <- gtf[gtf$type == 'gene']
seqlevelsStyle(gene.coords) <- 'UCSC'
gene.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')

How should I cite Signac?

Signac is currently unpublished, so we ask that you simply list the version of the package that you used and link to the github page (https://github.com/timoast/signac).

Signac is an extension of Seurat, and uses the Seurat object structure, so you should consider citing the Seurat paper if you have used Signac: https://doi.org/10.1016/j.cell.2019.05.031