Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
Nat Genet. 2019 Jun;51(6):981-989. doi: 10.1038/s41588-019-0411-1. Epub 2019 May 27.
Transcription factor (TF) binding specificities (motifs) are essential for the analysis of gene regulation. Accurate prediction of TF motifs is critical, because it is infeasible to assay all TFs in all sequenced eukaryotic genomes. There is ongoing controversy regarding the degree of motif diversification among related species that is, in part, because of uncertainty in motif prediction methods. Here we describe similarity regression, a significantly improved method for predicting motifs, which we use to update and expand the Cis-BP database. Similarity regression inherently quantifies TF motif evolution, and shows that previous claims of near-complete conservation of motifs between human and Drosophila are inflated, with nearly half of the motifs in each species absent from the other, largely due to extensive divergence in C2H2 zinc finger proteins. We conclude that diversification in DNA-binding motifs is pervasive, and present a new tool and updated resource to study TF diversity and gene regulation across eukaryotes.
转录因子 (TF) 结合特异性 (基序) 对于基因调控分析至关重要。准确预测 TF 基序至关重要,因为在所有已测序的真核生物基因组中检测所有 TF 是不切实际的。关于相关物种之间基序多样化的程度存在持续的争议,部分原因是由于基序预测方法存在不确定性。在这里,我们描述了相似性回归,这是一种显著改进的预测基序的方法,我们使用它来更新和扩展 Cis-BP 数据库。相似性回归本质上量化了 TF 基序的进化,并表明以前在人类和果蝇之间存在的基序几乎完全保守的说法是夸大的,每个物种中有近一半的基序在另一个物种中不存在,这主要是由于 C2H2 锌指蛋白的广泛分化。我们得出的结论是,DNA 结合基序的多样化是普遍存在的,并提供了一个新的工具和更新的资源,用于研究真核生物中 TF 的多样性和基因调控。