In this tutorial, we will perform DNA sequence motif analysis in Signac. We will explore two complementary options for performing motif analysis: one by finding overrepresented motifs in a set of differentially accessible peaks, one method performing differential motif activity analysis between groups of cells.

In this demonstration we use data from the adult mouse brain. See our vignette for the code used to generate this object, and links to the raw data. First, load the required packages and the pre-computed Seurat object:

library(Signac)
library(Seurat)
library(JASPAR2018)
library(TFBSTools)
library(BSgenome.Mmusculus.UCSC.mm10)
set.seed(1234)
mouse_brain <- readRDS("../vignette_data/adult_mouse_brain.rds")
mouse_brain
## An object of class Seurat
## 178653 features across 3882 samples within 2 assays
## Active assay: peaks (157203 features)
##  1 other assay present: RNA
##  2 dimensional reductions calculated: lsi, umap
p1 <- DimPlot(mouse_brain, label = TRUE, pt.size = 0.1) + NoLegend()
p1

## The Motif class

To facilitate motif analysis in Signac, we have create the Motif class to store all the required information, including a list of position weight matrices and a motif occurrence matrix. Here, we construct a Motif object and add it to our mouse brain dataset. A motif object can be added to any Seurat assay using the AddMotifObject function:

# Get a list of motif position weight matrices from the JASPAR database
pwm <- getMatrixSet(
x = JASPAR2018,
opts = list(species = 9606, all_versions = FALSE)
)

# Scan the DNA sequence of each peak for the presence of each motif
motif.matrix <- CreateMotifMatrix(
features = StringToGRanges(rownames(mouse_brain), sep = c(":", "-")),
pwm = pwm,
genome = 'mm10',
sep = c(":", "-")
)

# Create a new Mofif object to store the results
motif <- CreateMotifObject(
data = motif.matrix,
pwm = pwm
)

# Add the Motif object to the assay
object = mouse_brain[['peaks']],
motif.object = motif
)

In order to test for overrepresented motifs, we also need to compute some sequence characteristics of the peaks, such as GC content, sequence length, and dinucleotide frequency. The RegionStats function computes this information for us and stores the results in the feature metadata in the Seurat object.

mouse_brain <- RegionStats(
object = mouse_brain,
genome = BSgenome.Mmusculus.UCSC.mm10,
sep = c(":", "-")
)

## Finding overrepresented motifs

To identify potentially important cell-type-specific regulatory sequences, we can search for DNA motifs that are overrepresented in a set of peaks that are differentially accessible between cell types.

Here, we find differentially accessible peaks between Pvalb and Sst inhibitory interneurons. We then perform a hypergeometric test to test the probability of observing the motif at the given frequency by chance, comparing with a background set of peaks matched for GC content.

da_peaks <- FindMarkers(
object = mouse_brain,
ident.1 = 'Pvalb',
ident.2 = 'Sst',
only.pos = TRUE,
test.use = 'LR',
latent.vars = 'nCount_peaks'
)

# Test the differentially accessible peaks for overrepresented motifs
enriched.motifs <- FindMotifs(
object = mouse_brain,
)
head(enriched.motifs)
##             motif observed background percent.observed percent.background
## MA0497.1 MA0497.1      445       9070             44.5            22.6750
## MA0773.1 MA0773.1      313       5366             31.3            13.4150
## MA0072.1 MA0072.1      226       3148             22.6             7.8700
## MA1151.1 MA1151.1      230       3305             23.0             8.2625
## MA0660.1 MA0660.1      268       4356             26.8            10.8900
## MA0052.3 MA0052.3      318       5898             31.8            14.7450
##          fold.enrichment       pvalue  motif.name
## MA0497.1        1.962514 5.180470e-54       MEF2C
## MA0773.1        2.333209 8.571232e-50       MEF2D
## MA0072.1        2.871665 9.249701e-49 RORA(var.2)
## MA1151.1        2.783661 2.090626e-47        RORC
## MA0660.1        2.460973 7.449838e-46       MEF2B
## MA0052.3        2.156663 2.008640e-43       MEF2A

We can also plot the position weight matrices for the motifs, so we can visualize the different motif sequences.

MotifPlot(
object = mouse_brain,
)

We and others have previously shown that Mef-family motifs, particularly Mef2c, are enriched in Pvalb-specific peaks in scATAC-seq data (https://doi.org/10.1016/j.cell.2019.05.031; https://doi.org/10.1101/615179), and further shown that Mef2c is required for the development of Pvalb interneurons (https://www.nature.com/articles/nature25999). Here our results are consistent with these findings, and we observe a strong enrichment of Mef-family motifs in the top results from FindMotifs.

## Computing motif activities

We can also compute a per-cell motif activity score by running chromVAR. This allows us to visualize motif activities per cell, and also provides an alternative method of identifying differentially-active motifs between cell types.

ChromVAR identifies motifs associated with variability in chromatin accessibility between cells. See the chromVAR paper for a complete description of the method.

mouse_brain <- RunChromVAR(
object = mouse_brain,
genome = BSgenome.Mmusculus.UCSC.mm10
)

DefaultAssay(mouse_brain) <- 'chromvar'

# look at the activity of Mef2c, the top hit from the overrepresentation testing
p2 <- FeaturePlot(
object = mouse_brain,
features = rownames(enriched.motifs)[[1]],
min.cutoff = 'q10',
max.cutoff = 'q90',
pt.size = 0.1
)
CombinePlots(list(p1, p2), ncol = 2)

We can also directly test for differential activity scores between cell types. This tends to give similar results as performing an enrichment test on differentially accessible peaks between the cell types (shown above)

differential.activity <- FindMarkers(
object = mouse_brain,
ident.1 = 'Pvalb',
ident.2 = 'Sst',
only.pos = TRUE,
test.use = 'LR',
latent.vars = 'nCount_peaks'
)

MotifPlot(
object = mouse_brain,
)