Suppr超能文献

usDSM:一种使用欠采样方案预测有害同义突变的新方法。

usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme.

机构信息

GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University and the Institutes of Physical Science and Information Technology, Anhui University, China.

School of Computer Science and Technology, Anhui University, China.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab123.

Abstract

Although synonymous mutations do not alter the encoded amino acids, they may impact protein function by interfering with the regulation of RNA splicing or altering transcript splicing. New progress on next-generation sequencing technologies has put the exploration of synonymous mutations at the forefront of precision medicine. Several approaches have been proposed for predicting the deleterious synonymous mutations specifically, but their performance is limited by imbalance of the positive and negative samples. In this study, we firstly expanded the number of samples greatly from various data sources and compared six undersampling strategies to solve the problem of the imbalanced datasets. The results suggested that cluster centroid is the most effective scheme. Secondly, we presented a computational model, undersampling scheme based method for deleterious synonymous mutation (usDSM) prediction, using 14-dimensional biology features and random forest classifier to detect the deleterious synonymous mutation. The results on the test datasets indicated that the proposed usDSM model can attain superior performance in comparison with other state-of-the-art machine learning methods. Lastly, we found that the deep learning model did not play a substantial role in deleterious synonymous mutation prediction through a lot of experiments, although it achieves superior results in other fields. In conclusion, we hope our work will contribute to the future development of computational methods for a more accurate prediction of the deleterious effect of human synonymous mutation. The web server of usDSM is freely accessible at http://usdsm.xialab.info/.

摘要

虽然同义突变不会改变编码的氨基酸,但它们可能通过干扰 RNA 剪接的调节或改变转录物剪接来影响蛋白质功能。下一代测序技术的新进展将同义突变的探索置于精准医学的前沿。已经提出了几种专门预测有害同义突变的方法,但它们的性能受到正负样本不平衡的限制。在这项研究中,我们首先从各种数据源中极大地扩展了样本数量,并比较了六种欠采样策略来解决数据集不平衡的问题。结果表明,聚类质心是最有效的方案。其次,我们提出了一种计算模型,即基于欠采样方案的有害同义突变预测方法(usDSM),使用 14 维生物学特征和随机森林分类器来检测有害同义突变。在测试数据集上的结果表明,与其他最先进的机器学习方法相比,所提出的 usDSM 模型可以实现优越的性能。最后,我们通过大量实验发现,尽管深度学习模型在其他领域取得了优异的成绩,但在有害同义突变预测中并没有起到实质性的作用。总之,我们希望我们的工作将有助于未来开发更准确预测人类同义突变有害效应的计算方法。usDSM 的网络服务器可在 http://usdsm.xialab.info/ 免费访问。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验