Suppr超能文献

使用极端梯度提升算法通过混合特征识别酿酒酵母的复制起点。

Using extreme gradient boosting to identify origin of replication in Saccharomyces cerevisiae via hybrid features.

机构信息

Toxicology and Biomedicine Research Group, Faculty of Applied Sciences, Ton Duc Thang University, Ho Chi Minh City, Viet Nam.

Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City 106, Taiwan; Research Center of Artificial Intelligence in Medicine, Taipei Medical University, Taipei City 106, Taiwan.

出版信息

Genomics. 2020 May;112(3):2445-2451. doi: 10.1016/j.ygeno.2020.01.017. Epub 2020 Jan 24.

Abstract

DNA replication is a fundamental task that plays a crucial role in the propagation of all living things on earth. Hence, the accurate identification of its origin could be the key to giving an insightful understanding of the regulatory mechanism of gene expression. Indeed, with the robust development of computational techniques and the abundant biological sequencing data, it has become possible for scientists to identify the origin of replication accurately and promptly. This growing concern has drawn a lot of attention among experts in this field. However, to gain better outcomes, more work is required. Therefore, this study is designed to explore the combination of state-of-the-art features and extreme gradient boosting learning system in classifying DNA sequences. Our hybrid approach is able to identify the origin of DNA replication with achieved sensitivity of 85.19%, specificity of 93.83%, accuracy of 89.51%, and MCC of 0.7931. Evidence is presented to show that our proposed method is superior to the state-of-the-art methods on the same benchmark dataset. Moreover, the research results represent a further step towards developing the prediction models for DNA replication in particular and DNA sequences in general.

摘要

DNA 复制是地球上所有生物繁殖过程中的一项基本任务。因此,准确识别其起源可能是深入了解基因表达调控机制的关键。事实上,随着计算技术的蓬勃发展和丰富的生物测序数据的出现,科学家们已经能够准确、迅速地识别复制起点。这种日益增长的关注引起了该领域专家的广泛关注。然而,要想取得更好的结果,还需要做更多的工作。因此,本研究旨在探索最先进的特征与极端梯度提升学习系统在 DNA 序列分类中的结合。我们的混合方法能够以 85.19%的灵敏度、93.83%的特异性、89.51%的准确性和 0.7931 的 MCC 来识别 DNA 复制的起点。研究结果表明,与相同基准数据集上的最先进方法相比,我们提出的方法具有优越性。此外,该研究结果代表着在开发特定于 DNA 复制以及一般 DNA 序列的预测模型方面迈出了进一步的步伐。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验