Jolma Arttu, Laverty Kaitlin U, Fathi Ali, Yang Ally W H, Yellan Isaac, Vorontsov Ilya E, Inukai Sachi, Kribelbauer-Swietek Judith F, Gralak Antoni J, Razavi Rozita, Albu Mihai, Brechalov Alexander, Patel Zain M, Nozdrin Vladimir, Meshcheryakov Georgy, Kozin Ivan, Abramov Sergey, Boytsov Alexandr, Fornes Oriol, Makeev Vsevolod J, Grau Jan, Grosse Ivo, Bucher Philipp, Deplancke Bart, Kulakovskiy Ivan V, Hughes Timothy R
Donnelly Centre, University of Toronto, Toronto, ON M5S 3E1, Canada.
Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
bioRxiv. 2024 Nov 12:2024.11.11.622097. doi: 10.1101/2024.11.11.622097.
We describe an effort ("Codebook") to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments across multiple and assays produced motifs for just over half of the putative TFs analyzed (177, or 53%), of which most are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in and , and identify tens of thousands of conserved, base-level binding sites in the human genome. The use of multiple assays provides an unprecedented opportunity to benchmark and analyze TF sequence specificity, function, and evolution, as further explored in accompanying manuscripts. 1,421 human TFs are now associated with a DNA binding motif. Extrapolation from the Codebook benchmarking, however, suggests that many of the currently known binding motifs for well-studied TFs may inaccurately describe the TF's true sequence preferences.
我们描述了一项工作(“密码本”),以确定332种假定的、大多未被表征的人类转录因子(TFs)以及61种对照TFs的序列特异性。通过多个实验和检测方法进行的近5000次独立实验,为所分析的略超过一半的假定TFs(177个,即53%)生成了基序,其中大多数是单个TF所特有的。数据突出了转座元件在TF进化中的广泛贡献,包括在[具体方面1]和[具体方面2],并在人类基因组中识别出数以万计的保守的、碱基水平的结合位点。多种检测方法的使用为基准测试和分析TF序列特异性、功能及进化提供了前所未有的机会,正如随附手稿中进一步探讨的那样。现在有1421种人类TFs与一个DNA结合基序相关联。然而,从“密码本”基准测试推断,许多目前已知的、经过充分研究的TFs的结合基序可能无法准确描述TF的真实序列偏好。