Suppr超能文献

iDHS-DSAMS:基于二核苷酸属性矩阵和集成袋装树识别DNA酶I超敏位点。

iDHS-DSAMS: Identifying DNase I hypersensitive sites based on the dinucleotide property matrix and ensemble bagged tree.

作者信息

Zhang Shengli, Yu Qianhao, He Haoran, Zhu Fu, Wu Panjing, Gu Lingzhi, Jiang Sijie

机构信息

School of Mathematics and Statistics, Xidian University, Xi'an 710071, PR China.

School of Artificial Intelligence, Xidian University, Xi'an 710071, PR China.

出版信息

Genomics. 2020 Mar;112(2):1282-1289. doi: 10.1016/j.ygeno.2019.07.017. Epub 2019 Aug 1.

Abstract

DNase I hypersensitive site (DHS) is related to DNA regulatory elements, so the understanding of DHS sites is of great significance for biomedical research. However, traditional experiments are not very good at identifying recombinant sites of a large number of emerging DNA sequences by sequencing. Some machine learning methods have been proposed to identify DHS, but most methods ignore spatial autocorrelation of the DNA sequence. In this paper, we proposed a predictor called iDHS-DSAMS to identify DHS based on the benchmark datasets. We develop a feature extraction method called dinucleotide-based spatial autocorrelation (DSA). Then we use Min-Redundancy-Max-Relevance (mRMR) to remove irrelevant and redundant features and a 100-dimensional feature vector is selected. Finally, we utilize ensemble bagged tree as classifier, which is based on the oversampled datasets using SMOTE. Five-fold cross validation tests on two benchmark datasets indicate that the proposed method outperforms its existing counterparts on the individual accuracy (Acc), Matthews correlation coefficient (MCC), sensitivity (Sn) and specificity (Sp).

摘要

脱氧核糖核酸酶I超敏位点(DHS)与DNA调控元件相关,因此对DHS位点的理解对生物医学研究具有重要意义。然而,传统实验在通过测序识别大量新出现的DNA序列的重组位点方面并不十分擅长。已经提出了一些机器学习方法来识别DHS,但大多数方法都忽略了DNA序列的空间自相关性。在本文中,我们提出了一种名为iDHS-DSAMS的预测器,用于基于基准数据集识别DHS。我们开发了一种名为基于二核苷酸的空间自相关性(DSA)的特征提取方法。然后我们使用最小冗余最大相关(mRMR)来去除不相关和冗余的特征,并选择一个100维的特征向量。最后,我们利用集成袋装树作为分类器,它基于使用SMOTE的过采样数据集。在两个基准数据集上进行的五折交叉验证测试表明,所提出的方法在个体准确率(Acc)、马修斯相关系数(MCC)、灵敏度(Sn)和特异性(Sp)方面优于现有的同类方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验