All software associated with our algorithms can be freely downloaded from our github page.


cisDIVERSITY detects cis-regulatory modules in most high-throughput sequencing experiments that measure some biochemical regulation-related activity (e.g. ATAC-seq, ChIP-seq, GRO-seq, CAGE, etc.). It models DNA regions as diverse modules characterized by combinations of motifs, all of which are learned de novo. All it needs are the reported regions in FASTA format.


exoDIVERSITY resolves protein-DNA footprints from exonuclease based ChIP experiments. It learns a joint distribution over the footprints and motifs to divide the dataset into diverse binding modes. It can identify small nucleotide level variations within and outside canonical motifs which co-occur with variations in footprints. It needs the fasta files around the summit regions and the corresponding positive and negative reads.


DIVERSITY partitions ChIP-seq sequencing data into different protein-DNA binding modes. Each mode is characterized by its own de novo motif. DIVERSITY also determines the optimal number of modes from the data.

No Promoter Left Behind (NPLB)

NPLB is an unsupervised learning algorithm with feature selection that partitions TSS-aligned promoter sequences into distinct promoter architectures. Each of these architectures is characterized by its own set of promoter elements, which are all learned de novo. Therefore NPLB can be applied to high-resolution promoter data of any organism. We recommend it be applied to the full dataset, leaving out no promoter, in contrast to presorting/preselecting them on any criteria. The resulting architectures can then be examined for specific regulatory or evolutionary features.