Suppr超能文献

PRCFX-DT:一种基于图形的基因组序列特征选择与分类新方法。

PRCFX-DT: a new graph-based approach for feature selection and classification of genomic sequences.

作者信息

Khodaei Amin, Eskandari Sania, Sharifi Hadi, Mozaffari-Tazehkand Behzad

机构信息

Faculty of Electrical & Computer Engineering, University of Tabriz, Tabriz, Iran.

Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, USA.

出版信息

BMC Bioinformatics. 2025 Jun 17;26(1):159. doi: 10.1186/s12859-025-06183-4.

Abstract

BACKGROUND

In recent years, viral diseases have exhibited a significant incidence of infections and fatalities. The analysis of viral genomic sequences can be efficacious in evaluating the present and potentially forthcoming condition of viruses. Considering the importance of the internal structure of the cell and the nucleotide sequences within it, analyzing nucleotide sequences can provide a range of discussable features. On the other hand, it has been demonstrated that the use of graph algorithms and machine learning in the analysis and examination of virus samples and even viral variants can yield beneficial results.

RESULTS

This study proposes a novel approach that utilizes complex networks and probabilistic graph modeling methods to analyze viral genomic sequences for feature extraction. The proposed approach, which relies on the PageRank centrality algorithm, operates on codons that are associated with the nucleotide sequences. Experiments with machine learning algorithms were conducted on multiple datasets of viruses and various variants of coronavirus and influenza viruses. The use of a decision tree classifier model on the extracted distinguishing features enabled the differentiation of coronavirus samples from other samples. The high discriminative capability of the graph node centrality feature played a significant role in these experiments, establishing a meaningful connection with genetic concepts as well. The decision tree classifier applied on 173,228 genomic sequence samples originating from 30 distinct virus types, showed a remarkable accuracy rate of 99.73%.

CONCLUSION

The proposed algorithm was successfully tested on several types of viruses, and the interpretability of the extracted features also enabled its structural analysis. The use of a graph-based approach on genetic features containing information about the internal structure of nucleotides yielded results that could be significant for the identification of any type of virus or specific viral variant.

摘要

背景

近年来,病毒性疾病的感染率和死亡率显著上升。分析病毒基因组序列有助于评估病毒的当前状况以及潜在的未来发展态势。考虑到细胞内部结构及其核苷酸序列的重要性,分析核苷酸序列可揭示一系列值得探讨的特征。另一方面,已证明在病毒样本甚至病毒变体的分析和检测中使用图算法和机器学习能够产生有益结果。

结果

本研究提出了一种新颖的方法,利用复杂网络和概率图建模方法来分析病毒基因组序列以进行特征提取。该方法基于PageRank中心性算法,对与核苷酸序列相关的密码子进行操作。在多个病毒数据集以及冠状病毒和流感病毒的各种变体上进行了机器学习算法实验。在提取的显著特征上使用决策树分类器模型能够区分冠状病毒样本与其他样本。图节点中心性特征的高判别能力在这些实验中发挥了重要作用,也与遗传概念建立了有意义的联系。应用于来自30种不同病毒类型的173,228个基因组序列样本的决策树分类器显示出99.73%的显著准确率。

结论

所提出的算法在多种病毒类型上成功进行了测试,提取特征的可解释性也使其能够进行结构分析。对包含核苷酸内部结构信息的遗传特征使用基于图的方法,所得结果对于识别任何类型的病毒或特定病毒变体可能具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1daa/12172359/abecfa5edee9/12859_2025_6183_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验