Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia.
Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Russia.
Nucleic Acids Res. 2024 Jan 5;52(D1):D154-D163. doi: 10.1093/nar/gkad1077.
We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.
我们呈现了 HOCOMOCO 集合的重大更新,其中提供了 949 个人类转录因子和 720 个鼠标同源物的 DNA 结合特异性模式。为了发布此版本,我们在源自 14183 个 ChIP-Seq 实验和 2554 个 HT-SELEX 实验的峰集中进行了基序发现,产生了超过 40 万个候选基序。候选基序根据其与已知基序的相似性以及各自转录因子的 DNA 结合域层次结构进行了注释。接下来,对基序进行了人类专家策展,以对不同的基序亚型进行分层,并去除非信息性模式和常见的人工制品。最后,对经过策展的 10 万个基序子集进行了自动基准测试,以选择每个转录因子表现最佳的基序。由此产生的 HOCOMOCO v12 核心集合包含 1443 个经过验证的位置权重矩阵,其中包括特定转录因子的 DNA 结合基序的独特亚型。除了核心集合之外,HOCOMOCO v12 还提供了针对体内和体外结合位点识别以及调控序列变体注释优化的基序集。HOCOMOCO 可在 https://hocomoco12.autosome.org 和 https://hocomoco.autosome.org 上获得。