• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

探索生成对抗网络在生物序列分析中的潜力。

Exploring the Potential of GANs in Biological Sequence Analysis.

作者信息

Murad Taslim, Ali Sarwan, Patterson Murray

机构信息

Department of Computer Science, Georgia State University, Atlanta, GA 30302, USA.

出版信息

Biology (Basel). 2023 Jun 14;12(6):854. doi: 10.3390/biology12060854.

DOI:10.3390/biology12060854
PMID:37372139
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10295061/
Abstract

Biological sequence analysis is an essential step toward building a deeper understanding of the underlying functions, structures, and behaviors of the sequences. It can help in identifying the characteristics of the associated organisms, such as viruses, etc., and building prevention mechanisms to eradicate their spread and impact, as viruses are known to cause epidemics that can become global pandemics. New tools for biological sequence analysis are provided by machine learning (ML) technologies to effectively analyze the functions and structures of the sequences. However, these ML-based methods undergo challenges with data imbalance, generally associated with biological sequence datasets, which hinders their performance. Although various strategies are present to address this issue, such as the SMOTE algorithm, which creates synthetic data, however, they focus on local information rather than the overall class distribution. In this work, we explore a novel approach to handle the data imbalance issue based on generative adversarial networks (GANs), which use the overall data distribution. GANs are utilized to generate synthetic data that closely resembles real data, thus, these generated data can be employed to enhance the ML models' performance by eradicating the class imbalance problem for biological sequence analysis. We perform four distinct classification tasks by using four different sequence datasets (Influenza A Virus, PALMdb, VDjDB, Host) and our results illustrate that GANs can improve the overall classification performance.

摘要

生物序列分析是深入理解序列潜在功能、结构和行为的重要一步。它有助于识别相关生物体(如病毒等)的特征,并建立预防机制以根除其传播和影响,因为病毒已知会引发可能演变为全球大流行的流行病。机器学习(ML)技术提供了用于生物序列分析的新工具,以有效分析序列的功能和结构。然而,这些基于ML的方法面临数据不平衡的挑战,这通常与生物序列数据集相关,从而阻碍了它们的性能。尽管存在各种解决此问题的策略,例如创建合成数据的SMOTE算法,但是,它们关注的是局部信息而非整体类分布。在这项工作中,我们探索了一种基于生成对抗网络(GAN)来处理数据不平衡问题的新方法,该方法使用整体数据分布。GAN用于生成与真实数据非常相似的合成数据,因此,这些生成的数据可用于通过消除生物序列分析中的类不平衡问题来提高ML模型的性能。我们使用四个不同的序列数据集(甲型流感病毒、PALMdb、VDjDB、宿主)执行四个不同的分类任务,我们的结果表明GAN可以提高整体分类性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/7e0c0d04f9f9/biology-12-00854-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/679be71a9bb2/biology-12-00854-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/be3c698d9ba9/biology-12-00854-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/292504fa29d5/biology-12-00854-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/72a2746af4aa/biology-12-00854-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/f59cefbd59ef/biology-12-00854-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/ac976e1296b2/biology-12-00854-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/c058459930e5/biology-12-00854-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/7e0c0d04f9f9/biology-12-00854-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/679be71a9bb2/biology-12-00854-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/be3c698d9ba9/biology-12-00854-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/292504fa29d5/biology-12-00854-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/72a2746af4aa/biology-12-00854-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/f59cefbd59ef/biology-12-00854-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/ac976e1296b2/biology-12-00854-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/c058459930e5/biology-12-00854-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6957/10295061/7e0c0d04f9f9/biology-12-00854-g008.jpg

相似文献

1
Exploring the Potential of GANs in Biological Sequence Analysis.探索生成对抗网络在生物序列分析中的潜力。
Biology (Basel). 2023 Jun 14;12(6):854. doi: 10.3390/biology12060854.
2
A survey on generative adversarial networks for imbalance problems in computer vision tasks.关于计算机视觉任务中不平衡问题的生成对抗网络调查。
J Big Data. 2021;8(1):27. doi: 10.1186/s40537-021-00414-0. Epub 2021 Jan 29.
3
The role of generative adversarial networks in brain MRI: a scoping review.生成对抗网络在脑部磁共振成像中的作用:一项范围综述
Insights Imaging. 2022 Jun 4;13(1):98. doi: 10.1186/s13244-022-01237-0.
4
Adversarial symmetric GANs: Bridging adversarial samples and adversarial networks.对抗对称 GANs:连接对抗样本和对抗网络。
Neural Netw. 2021 Jan;133:148-156. doi: 10.1016/j.neunet.2020.10.016. Epub 2020 Nov 6.
5
Evolutionary Multiobjective Optimization Driven by Generative Adversarial Networks (GANs).由生成对抗网络(GANs)驱动的进化多目标优化
IEEE Trans Cybern. 2021 Jun;51(6):3129-3142. doi: 10.1109/TCYB.2020.2985081. Epub 2021 May 18.
6
Systematic Review of Generative Adversarial Networks (GANs) for Medical Image Classification and Segmentation.生成对抗网络(GANs)在医学图像分类和分割中的系统评价。
J Digit Imaging. 2022 Apr;35(2):137-152. doi: 10.1007/s10278-021-00556-w. Epub 2022 Jan 12.
7
Generative Adversarial Network-Based Fault Detection in Semiconductor Equipment with Class-Imbalanced Data.基于生成对抗网络的类不平衡数据半导体设备故障检测。
Sensors (Basel). 2023 Feb 8;23(4):1889. doi: 10.3390/s23041889.
8
A Tutorial on Generative Adversarial Networks with Application to Classification of Imbalanced Data.生成对抗网络教程及其在不平衡数据分类中的应用
Stat Anal Data Min. 2022 Oct;15(5):543-552. doi: 10.1002/sam.11570. Epub 2021 Dec 31.
9
GIU-GANs: Global Information Utilization for Generative Adversarial Networks.GIU-GANs:用于生成对抗网络的全球信息利用。
Neural Netw. 2022 Aug;152:487-498. doi: 10.1016/j.neunet.2022.05.014. Epub 2022 May 21.
10
Synthetic Generation of 3D Microscopy Images using Generative Adversarial Networks.基于生成对抗网络的三维显微镜图像合成
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:549-552. doi: 10.1109/EMBC48229.2022.9871631.

引用本文的文献

1
GENNUS: generative approaches for nucleotide sequences enhance mirtron classification.GENNUS:核苷酸序列的生成方法增强了微小内含子分类。
NAR Genom Bioinform. 2025 Jun 20;7(2):lqaf072. doi: 10.1093/nargab/lqaf072. eCollection 2025 Jun.
2
Deep learning architectures for influenza dynamics and treatment optimization: a comprehensive review.用于流感动态和治疗优化的深度学习架构:全面综述
Front Artif Intell. 2025 May 27;8:1521886. doi: 10.3389/frai.2025.1521886. eCollection 2025.
3
StructmRNA a BERT based model with dual level and conditional masking for mRNA representation.

本文引用的文献

1
PseU-ST: A new stacked ensemble-learning method for identifying RNA pseudouridine sites.PseU-ST:一种用于识别RNA假尿苷位点的新型堆叠集成学习方法。
Front Genet. 2023 Jan 19;14:1121694. doi: 10.3389/fgene.2023.1121694. eCollection 2023.
2
Ribovirus classification by a polymerase barcode sequence.基于聚合酶条码序列的核糖核酸病毒分类。
PeerJ. 2022 Oct 13;10:e14055. doi: 10.7717/peerj.14055. eCollection 2022.
3
ISTRF: Identification of sucrose transporter using random forest.ISTRF:利用随机森林鉴定蔗糖转运蛋白
StructmRNA:一种基于 BERT 的模型,具有双重水平和条件掩蔽,用于 mRNA 表示。
Sci Rep. 2024 Oct 29;14(1):26043. doi: 10.1038/s41598-024-77172-5.
Front Genet. 2022 Sep 12;13:1012828. doi: 10.3389/fgene.2022.1012828. eCollection 2022.
4
Identification of protein-protein interaction associated functions based on gene ontology and KEGG pathway.基于基因本体论和KEGG通路鉴定蛋白质-蛋白质相互作用相关功能
Front Genet. 2022 Sep 12;13:1011659. doi: 10.3389/fgene.2022.1011659. eCollection 2022.
5
PWM2Vec: An Efficient Embedding Approach for Viral Host Specification from Coronavirus Spike Sequences.PWM2Vec:一种基于冠状病毒刺突序列进行病毒宿主特异性分析的高效嵌入方法。
Biology (Basel). 2022 Mar 9;11(3):418. doi: 10.3390/biology11030418.
6
Petabase-scale sequence alignment catalyses viral discovery.Petabase 规模的序列比对促进病毒发现。
Nature. 2022 Feb;602(7895):142-147. doi: 10.1038/s41586-021-04332-2. Epub 2022 Jan 26.
7
Recent Developments on Therapeutic and Diagnostic Approaches for COVID-19.新冠病毒治疗和诊断方法的最新进展。
AAPS J. 2021 Jan 5;23(1):14. doi: 10.1208/s12248-020-00532-2.
8
Phylogenetic Analysis of HIV-1 Genomes Based on the Position-Weighted K-mers Method.基于位置加权k-mer方法的HIV-1基因组系统发育分析
Entropy (Basel). 2020 Feb 23;22(2):255. doi: 10.3390/e22020255.
9
Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone.机器学习方法仅基于刺突序列就能准确预测冠状病毒的宿主特异性。
Biochem Biophys Res Commun. 2020 Dec 10;533(3):553-558. doi: 10.1016/j.bbrc.2020.09.010. Epub 2020 Sep 18.
10
SARS-CoV-2: a storm is raging.新型冠状病毒:风暴正在肆虐。
J Clin Invest. 2020 May 1;130(5):2202-2205. doi: 10.1172/JCI137647.