Suppr超能文献

DHSpred:基于支持向量机,利用随机森林选择的最优特征进行人类DNA酶I超敏感位点预测。

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest.

作者信息

Manavalan Balachandran, Shin Tae Hwan, Lee Gwang

机构信息

Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.

Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea.

出版信息

Oncotarget. 2017 Dec 8;9(2):1944-1956. doi: 10.18632/oncotarget.23099. eCollection 2018 Jan 5.

Abstract

DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at: http://www.thegleelab.org/DHSpred.html.

摘要

脱氧核糖核酸酶I超敏感位点(DHSs)是基因组区域,可提供有关转录调控元件的存在以及染色质状态的重要信息。因此,在未表征的DNA序列中识别DHSs对于理解其生物学功能和机制至关重要。尽管已经提出了许多实验方法来识别DHSs,但事实证明,它们在全基因组应用中成本高昂。因此,有必要开发用于DHS预测的计算方法。在本研究中,我们提出了一种基于支持向量机(SVM)的DHS预测方法,称为DHSpred(人类DNA序列中的脱氧核糖核酸酶I超敏感位点预测器),它使用174个最优特征进行训练。使用随机森林算法从包括核苷酸组成以及二核苷酸和三核苷酸物理化学性质的大量特征集中确定了特征的最佳组合。DHSpred的马修斯相关系数和准确率分别达到0.660和0.871,比使用未优化特征训练的对照支持向量机预测器高出3%,表明了特征选择方法的有效性。此外,DHSpred的性能优于现有最佳预测器。我们已经开发了一个在线预测服务器来帮助科学界,可在以下网址免费获取:http://www.thegleelab.org/DHSpred.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/be8c/5788611/3fb9d7673215/oncotarget-09-1944-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验