Signac is an extension of Seurat for the analysis of single-cell chromatin data (DNA-based single-cell assays). We have extended the Seurat object to include information about the genome sequence and genomic coordinates of sequenced fragments per cell, and include functions needed for the analysis of single-cell chromatin data.
Signac uses the Seurat object structure, and so all the Seurat commands can be used when analysing data with Signac. See the Seurat documentation for more information: https://satijalab.org/seurat/
There are two important points to consider when merging Seurat objects containing single-cell chromatin data. First, if peak calling was performed separately on the different objects the peaks will not overlap perfectly, and non-overlapping peaks are treated as completely different features in Seurat. Second, we do not currently support the use of multiple fragment files per assay in Signac (support will be added in the future).
To deal with the first issue of finding corresponding peaks across objects, you can merge objects in a coordinate-aware way in Signac using the
MergeWithRegions function. This will consider overlapping peaks as equivalent. Alternatively, you could create a unified peak set and quantify reads in each peak in each cell using the
# decompress files gzip -d frag_1.tsv.gz frag_2.tsv.gz # merge files (avoids having to re-sort) sort -m -k 1,1V -k2,2n frag_1.tsv frag_2.tsv > fragments.tsv # block gzip compress the merged file bgzip -@ 4 fragments.tsv # -@ 4 uses 4 threads # index the bgzipped file tabix --preset=bed fragments.tsv.gz
Once the fragment files for the merged object are consolidated, you can set the path to the merged fragment file in R:
fragment.path <- "path.to.new.file" merged.obj <- SetFragments(object = merged.obj, file = fragment.path)
Choosing the dimensionality is a general problem in single-cell analysis for which there is no simple solution. There has been discussion about this for scRNA-seq, and you can read our recommendations for scRNA-seq in the Seurat vignettes: https://satijalab.org/seurat/v3.1/pbmc3k_tutorial.html (see “Determine the ‘dimensionality’ of the dataset”).
Here are some general tips/suggestions that might help guide you in the choice for number of dimensions:
If you are studying an organism that does not have a
BSgenome genome package or
EnsDB annotation package available on BioConductor, you can still use your own GTF file or FASTA files with Signac.
To create your own
BSgenome data package, see this vignette.
To use a GTF file, you can import it using
rtracklayer, for example:
gtf <- rtracklayer::import('genes.gtf') gene.coords <- gtf[gtf$type == 'gene'] seqlevelsStyle(gene.coords) <- 'UCSC' gene.coords <- keepStandardChromosomes(gene.coords, pruning.mode = 'coarse')
Signac is currently unpublished, so we ask that you simply list the version of the package that you used and link to the github page (https://github.com/timoast/signac).