School of Energy and Chemical Engineering, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.
Department of Bioengineering, University of California San Diego, La Jolla CA 92093, USA.
Brief Bioinform. 2023 Mar 19;24(2). doi: 10.1093/bib/bbad024.
Recognizing binding sites of DNA-binding proteins is a key factor for elucidating transcriptional regulation in organisms. ChIP-exo enables researchers to delineate genome-wide binding landscapes of DNA-binding proteins with near single base-pair resolution. However, the peak calling step hinders ChIP-exo application since the published algorithms tend to generate false-positive and false-negative predictions. Here, we report the development of DEOCSU (DEep-learning Optimized ChIP-exo peak calling SUite), a novel machine learning-based ChIP-exo peak calling suite. DEOCSU entails the deep convolutional neural network model which was trained with curated ChIP-exo peak data to distinguish the visualized data of bona fide peaks from false ones. Performance validation of the trained deep-learning model indicated its high accuracy, high precision and high recall of over 95%. Applying the new suite to both in-house and publicly available ChIP-exo datasets obtained from bacteria, eukaryotes and archaea revealed an accurate prediction of peaks containing canonical motifs, highlighting the versatility and efficiency of DEOCSU. Furthermore, DEOCSU can be executed on a cloud computing platform or the local environment. With visualization software included in the suite, adjustable options such as the threshold of peak probability, and iterable updating of the pre-trained model, DEOCSU can be optimized for users' specific needs.
识别 DNA 结合蛋白的结合位点是阐明生物转录调控的关键因素。ChIP-exo 使研究人员能够以接近单个碱基对的分辨率描绘 DNA 结合蛋白的全基因组结合图谱。然而,峰调用步骤阻碍了 ChIP-exo 的应用,因为已发表的算法往往会产生假阳性和假阴性预测。在这里,我们报告了 DEOCSU(基于深度学习的 ChIP-exo 峰调用套件)的开发,这是一种新的基于机器学习的 ChIP-exo 峰调用套件。DEOCSU 包含经过精心策划的 ChIP-exo 峰数据训练的深度卷积神经网络模型,用于区分真实峰的可视化数据与虚假峰的可视化数据。经过训练的深度学习模型的性能验证表明,其准确率、精度和召回率均超过 95%。将新套件应用于来自细菌、真核生物和古菌的内部和公开可用的 ChIP-exo 数据集,准确预测了含有规范基序的峰,突出了 DEOCSU 的多功能性和效率。此外,DEOCSU 可以在云计算平台或本地环境上执行。通过在套件中包含可视化软件,用户可以根据需要调整峰概率阈值等可调选项,并迭代更新预训练模型,以满足用户的特定需求。