In this tutorial, we will perform DNA sequence motif analysis in Signac. We will explore two complementary options for performing motif analysis: one by finding overrepresented motifs in a set of differentially accessible peaks, one method performing differential motif activity analysis between groups of cells.
In this demonstration we use data from the adult mouse brain. See our vignette for the code used to generate this object, and links to the raw data. First, load the required packages and the pre-computed Seurat object:
mouse_brain <- readRDS("../vignette_data/adult_mouse_brain.rds") mouse_brain
## An object of class Seurat ## 178653 features across 3882 samples within 2 assays ## Active assay: peaks (157203 features) ## 1 other assay present: RNA ## 2 dimensional reductions calculated: lsi, umap
To facilitate motif analysis in Signac, we have create the
Motif class to store all the required information, including a list of position weight matrices and a motif occurrence matrix. Here, we construct a
Motif object and add it to our mouse brain dataset. A motif object can be added to any Seurat assay using the
# Get a list of motif position weight matrices from the JASPAR database pwm <- getMatrixSet( x = JASPAR2018, opts = list(species = 9606, all_versions = FALSE) ) # Scan the DNA sequence of each peak for the presence of each motif motif.matrix <- CreateMotifMatrix( features = StringToGRanges(rownames(mouse_brain), sep = c(":", "-")), pwm = pwm, genome = 'mm10', sep = c(":", "-") ) # Create a new Mofif object to store the results motif <- CreateMotifObject( data = motif.matrix, pwm = pwm ) # Add the Motif object to the assay mouse_brain[['peaks']] <- AddMotifObject( object = mouse_brain[['peaks']], motif.object = motif )
In order to test for overrepresented motifs, we also need to compute some sequence characteristics of the peaks, such as GC content, sequence length, and dinucleotide frequency. The
RegionStats function computes this information for us and stores the results in the feature metadata in the Seurat object.
To identify potentially important cell-type-specific regulatory sequences, we can search for DNA motifs that are overrepresented in a set of peaks that are differentially accessible between cell types.
Here, we find differentially accessible peaks between Pvalb and Sst inhibitory interneurons. We then perform a hypergeometric test to test the probability of observing the motif at the given frequency by chance, comparing with a background set of peaks matched for GC content.
da_peaks <- FindMarkers( object = mouse_brain, ident.1 = 'Pvalb', ident.2 = 'Sst', only.pos = TRUE, test.use = 'LR', latent.vars = 'nCount_peaks' ) # Test the differentially accessible peaks for overrepresented motifs enriched.motifs <- FindMotifs( object = mouse_brain, features = head(rownames(da_peaks), 1000) )
## motif observed background percent.observed percent.background ## MA0497.1 MA0497.1 445 9070 44.5 22.6750 ## MA0773.1 MA0773.1 313 5366 31.3 13.4150 ## MA0072.1 MA0072.1 226 3148 22.6 7.8700 ## MA1151.1 MA1151.1 230 3305 23.0 8.2625 ## MA0660.1 MA0660.1 268 4356 26.8 10.8900 ## MA0052.3 MA0052.3 318 5898 31.8 14.7450 ## fold.enrichment pvalue motif.name ## MA0497.1 1.962514 5.180470e-54 MEF2C ## MA0773.1 2.333209 8.571232e-50 MEF2D ## MA0072.1 2.871665 9.249701e-49 RORA(var.2) ## MA1151.1 2.783661 2.090626e-47 RORC ## MA0660.1 2.460973 7.449838e-46 MEF2B ## MA0052.3 2.156663 2.008640e-43 MEF2A
We can also plot the position weight matrices for the motifs, so we can visualize the different motif sequences.
We and others have previously shown that Mef-family motifs, particularly Mef2c, are enriched in Pvalb-specific peaks in scATAC-seq data (https://doi.org/10.1016/j.cell.2019.05.031; https://doi.org/10.1101/615179), and further shown that Mef2c is required for the development of Pvalb interneurons (https://www.nature.com/articles/nature25999). Here our results are consistent with these findings, and we observe a strong enrichment of Mef-family motifs in the top results from
We can also compute a per-cell motif activity score by running chromVAR. This allows us to visualize motif activities per cell, and also provides an alternative method of identifying differentially-active motifs between cell types.
ChromVAR identifies motifs associated with variability in chromatin accessibility between cells. See the chromVAR paper for a complete description of the method.
mouse_brain <- RunChromVAR( object = mouse_brain, genome = BSgenome.Mmusculus.UCSC.mm10 ) DefaultAssay(mouse_brain) <- 'chromvar' # look at the activity of Mef2c, the top hit from the overrepresentation testing p2 <- FeaturePlot( object = mouse_brain, features = rownames(enriched.motifs)[], min.cutoff = 'q10', max.cutoff = 'q90', pt.size = 0.1 ) CombinePlots(list(p1, p2), ncol = 2)
We can also directly test for differential activity scores between cell types. This tends to give similar results as performing an enrichment test on differentially accessible peaks between the cell types (shown above)
differential.activity <- FindMarkers( object = mouse_brain, ident.1 = 'Pvalb', ident.2 = 'Sst', only.pos = TRUE, test.use = 'LR', latent.vars = 'nCount_peaks' ) MotifPlot( object = mouse_brain, motifs = head(rownames(differential.activity)), assay = 'peaks' )