Suppr超能文献

MLSNet:一种用于预测转录因子结合位点的深度学习模型。

MLSNet: a deep learning model for predicting transcription factor binding sites.

机构信息

School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China.

Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Wellington Rd, Clayton, Melbourne, VIC 3800, Australia.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae489.

Abstract

Accurate prediction of transcription factor binding sites (TFBSs) is essential for understanding gene regulation mechanisms and the etiology of diseases. Despite numerous advances in deep learning for predicting TFBSs, their performance can still be enhanced. In this study, we propose MLSNet, a novel deep learning architecture designed specifically to predict TFBSs. MLSNet innovatively integrates multisize convolutional fusion with long short-term memory (LSTM) networks to effectively capture DNA-sparse higher-order sequence features. Further, MLSNet incorporates super token attention and Bi-LSTM to systematically extract and integrate higher-order DNA shape features. Experimental results on 165 ChIP-seq (chromatin immunoprecipitation followed by sequencing) datasets indicate that MLSNet consistently outperforms several state-of-the-art algorithms in the prediction of TFBSs. Specifically, MLSNet reports average metrics: 0.8306 for ACC, 0.8992 for AUROC, and 0.9035 for AUPRC, surpassing the second-best methods by 1.82%, 1.68%, and 1.54%, respectively. This research delineates the effectiveness of combining multi-size convolutional layers with LSTM and DNA shape-based features in enhancing predictive accuracy. Moreover, this study comprehensively assesses the variability in model performance across different cell lines and transcription factors. The source code of MLSNet is available at https://github.com/minghaidea/MLSNet.

摘要

准确预测转录因子结合位点(TFBS)对于理解基因调控机制和疾病的病因至关重要。尽管深度学习在预测 TFBS 方面取得了许多进展,但它们的性能仍可以得到提高。在这项研究中,我们提出了 MLSNet,这是一种专门为预测 TFBS 而设计的新型深度学习架构。MLSNet 创新性地将多尺寸卷积融合与长短期记忆(LSTM)网络相结合,有效地捕获 DNA 稀疏的高阶序列特征。此外,MLSNet 还结合了超级标记注意力和 Bi-LSTM,以系统地提取和整合高阶 DNA 形状特征。在 165 个 ChIP-seq(染色质免疫沉淀测序)数据集上的实验结果表明,MLSNet 在预测 TFBS 方面始终优于几种最先进的算法。具体来说,MLSNet 的平均指标为:ACC 为 0.8306,AUROC 为 0.8992,AUPRC 为 0.9035,分别比第二好的方法高出 1.82%、1.68%和 1.54%。这项研究表明,结合多尺寸卷积层与 LSTM 和基于 DNA 形状的特征在提高预测准确性方面是有效的。此外,本研究还全面评估了不同细胞系和转录因子下模型性能的可变性。MLSNet 的源代码可在 https://github.com/minghaidea/MLSNet 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b9c6/11442149/d5126ebefe2a/bbae489f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验