Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
Genome Res. 2018 Jun;28(6):891-900. doi: 10.1101/gr.226852.117. Epub 2018 Apr 13.
The representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated noncoding genetic variants. We present a novel TF binding motif representation, the -mer set memory (KSM), which consists of a set of aligned -mers that are overrepresented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix (PWM) models and other more complex motif models across a large set of ChIP-seq experiments. Furthermore, KSMs outperform PWMs and more complex motif models in predicting in vitro binding sites. KMAC also identifies correct motifs in more experiments than five state-of-the-art motif discovery methods. In addition, KSM-derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1600 ENCODE TF ChIP-seq data sets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of noncoding genetic variations.
转录因子 (TF) 序列结合特异性的表示和发现对于理解基因调控网络和解释与疾病相关的非编码遗传变异的影响至关重要。我们提出了一种新的 TF 结合基序表示方法,称为 -mer 集记忆 (KSM),它由一组在 TF 结合位点上过度表达的对齐 -mers 组成,以及一种称为 KMAC 的新方法,用于从头发现 KSMs。我们发现,KSMs 在大量 ChIP-seq 实验中比位置权重矩阵 (PWM) 模型和其他更复杂的基序模型更准确地预测体内结合位点。此外,KSMs 在预测体外结合位点方面优于 PWMs 和更复杂的基序模型。KMAC 在比五种最先进的基序发现方法更多的实验中识别出正确的基序。此外,KSM 衍生的特征在预测表达数量性状基因座 (eQTL) 等位基因的差异调控活性方面优于 PWM 和深度学习模型衍生的序列特征。最后,我们已经将 KMAC 应用于 1600 个 ENCODE TF ChIP-seq 数据集,并创建了 KSM 和 PWM 基序的公共资源。我们期望 KSM 表示和 KMAC 方法将在表征 TF 结合特异性和解释非编码遗传变异的影响方面具有价值。