Suppr超能文献

iDHS-RGME:通过整合核苷酸组成和理化性质信息鉴定 DNase I 超敏位点。

iDHS-RGME: Identification of DNase I hypersensitive sites by integrating information on nucleotide composition and physicochemical properties.

机构信息

School of Science, Minzu University of China, Beijing, 100081, China.

School of Science, Minzu University of China, Beijing, 100081, China.

出版信息

Biochem Biophys Res Commun. 2024 Nov 19;734:150618. doi: 10.1016/j.bbrc.2024.150618. Epub 2024 Aug 29.

Abstract

As pivotal markers of chromatin accessibility, DNase I hypersensitive sites (DHSs) intimately link to fundamental biological processes encompassing gene expression regulation and disease pathogenesis. Developing efficient and precise algorithms for DHSs identification holds paramount importance for unraveling genome functionality and elucidating disease mechanisms. This study innovatively presents iDHS-RGME, an Extremely Randomized Trees (Extra-Trees)-based algorithm that integrates unique feature extraction techniques for enhanced DHSs prediction. Specifically, iDHS-RGME utilizes two feature extraction approaches: Reverse Complementary Kmer (RCKmer) and Geary Spatial Autocorrelation (GSA), which comprehensively capture sequence attributes from diverse angles, bolstering information richness and accuracy. To address data imbalance, Borderline-SMOTE is employed, followed by Maximum Information Coefficient (MIC) for meticulous feature selection. Comparative evaluations underscored the superiority of the Extra-Trees classifier, which was subsequently adopted for model prediction. Through rigorous five-fold cross-validation, iDHS-RGME achieved remarkable accuracies of 94.71 % and 95.07 % on two independent datasets, outperforming previous models in terms of both precision and effectiveness.

摘要

作为染色质可及性的关键标记物,DNase I 超敏位点 (DHSs) 与包括基因表达调控和疾病发病机制在内的基本生物学过程密切相关。开发高效、精确的 DHSs 识别算法对于揭示基因组功能和阐明疾病机制至关重要。本研究创新性地提出了 iDHS-RGME,这是一种基于极端随机树 (Extra-Trees) 的算法,集成了独特的特征提取技术,用于增强 DHSs 的预测。具体来说,iDHS-RGME 利用了两种特征提取方法:反向互补 Kmer (RCKmer) 和 Geary 空间自相关 (GSA),它们从多个角度全面捕捉序列属性,增强了信息的丰富度和准确性。为了解决数据不平衡问题,采用了 Borderline-SMOTE,然后使用最大信息系数 (MIC) 进行细致的特征选择。比较评估突显了 Extra-Trees 分类器的优越性,随后该分类器被用于模型预测。通过严格的五重交叉验证,iDHS-RGME 在两个独立数据集上实现了 94.71%和 95.07%的出色准确率,在精度和有效性方面均优于以前的模型。

相似文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验