Suppr超能文献

利用多种伪组分模式预测植物基因组中的DNase I超敏感位点。

Prediction of DNase I hypersensitive sites in plant genome using multiple modes of pseudo components.

作者信息

Zhang Shanxin, Zhuang Weichao, Xu Zhenghong

机构信息

Engineering Research Center of Internet of Things Technology Applications (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China; School of Pharmaceutical Sciences, Jiangnan University, Wuxi, Jiangsu 214122, China.

Engineering Research Center of Internet of Things Technology Applications (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi, Jiangsu 214122, China.

出版信息

Anal Biochem. 2018 May 15;549:149-156. doi: 10.1016/j.ab.2018.03.025. Epub 2018 Mar 28.

Abstract

DNase I hypersensitive sites (DHSs) are accessible chromatin zones hypersensitive to DNase I endonucleases in plant genome. DHSs have been used as markers for the presence of transcriptional regulatory elements. It is an important complement to develop computational methods to identify DHSs for discovering potential regulatory elements. To the best of our knowledge, several machine learning approaches have been proposed for the DHSs prediction, but there is still room for improvements. In this work, a new predictor called pDHS-WE was proposed for prediction of DHSs in plant genome by using weighted ensemble learning framework. Here, five classes of heterogeneous features were used to represent the sequences. Five random forest (RF) operators were constructed based on these five classes of features. The proposed pDHS-WE was formed by fusing the five individual RF classifiers into an ensemble predictor. Genetic algorithm was employed to obtain the weights of different classes of features. In the experiments, pDHS-WE obtained accuracy of 88.5%, sensitivity of 89.1%, specificity of 88.0%, and AUC of 0.958, which was more than 2.7%, 2%, 3.5% and 2.6% higher than state-of-the-art methods, respectively. The results suggested that pDHS-WE may become a useful tool for transcriptional regulatory elements analysis in plant genome.

摘要

脱氧核糖核酸酶I超敏位点(DHSs)是植物基因组中对脱氧核糖核酸酶I核酸内切酶敏感的可及染色质区域。DHSs已被用作转录调控元件存在的标记。开发计算方法来识别DHSs以发现潜在调控元件是一项重要的补充。据我们所知,已经提出了几种机器学习方法用于DHSs预测,但仍有改进空间。在这项工作中,提出了一种名为pDHS-WE的新预测器,通过使用加权集成学习框架来预测植物基因组中的DHSs。在这里,使用五类异质特征来表示序列。基于这五类特征构建了五个随机森林(RF)算子。所提出的pDHS-WE是通过将五个单独的RF分类器融合成一个集成预测器而形成的。采用遗传算法来获得不同类特征的权重。在实验中,pDHS-WE的准确率为88.5%,灵敏度为89.1%,特异性为88.0%,AUC为0.958,分别比现有方法高出2.7%、2%、3.5%和2.6%以上。结果表明,pDHS-WE可能成为植物基因组转录调控元件分析的有用工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验