• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SpecGMM:将光谱分析与高斯混合模型相结合用于分类学分类和鉴别性DNA区域的识别。

SpecGMM: Integrating Spectral analysis and Gaussian Mixture Models for taxonomic classification and identification of discriminative DNA regions.

作者信息

Jaiswal Saish, Murthy Hema A, Narayanan Manikandan

机构信息

Department of Computer Science and Engineering, Indian Institute of Technology (IIT) Madras, Chennai 600036, India.

Department of Computer Science and Engineering, Shiv Nadar University, Chennai 603110, India.

出版信息

Bioinform Adv. 2024 Nov 5;4(1):vbae171. doi: 10.1093/bioadv/vbae171. eCollection 2024.

DOI:10.1093/bioadv/vbae171
PMID:39659586
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11631429/
Abstract

MOTIVATION

Genomic signal processing (GSP), which transforms biomolecular sequences into discrete signals for spectral analysis, has provided valuable insights into DNA sequence, structure, and evolution. However, challenges persist with spectral representations of variable-length sequences for tasks like species classification and in interpreting these spectra to identify discriminative DNA regions.

RESULTS

We introduce SpecGMM, a novel framework that integrates sliding window-based Spectral analysis with a Gaussian Mixture Model to transform variable-length DNA sequences into fixed-dimensional spectral representations for taxonomic classification. SpecGMM's hyperparameters were selected using a dataset of plant sequences, and applied unchanged across diverse datasets, including mitochondrial DNA, viral and bacterial genome, and 16S rRNA sequences. Across these datasets, SpecGMM outperformed a baseline method, with 9.45% average and 35.55% maximum improvement in test accuracies for a Linear Discriminant classifier. Regarding interpretability, SpecGMM revealed discriminative hypervariable regions in 16S rRNA sequences-particularly V3/V4 for discriminating higher taxa and V2/V3 for lower taxa-corroborating their known classification relevance. SpecGMM's spectrogram video analysis helped visualize species-specific DNA signatures. SpecGMM thus provides a robust and interpretable method for spectral DNA analysis, opening new avenues in GSP research.

AVAILABILITY AND IMPLEMENTATION

SpecGMM's source code is available at https://github.com/BIRDSgroup/SpecGMM.

摘要

动机

基因组信号处理(GSP)将生物分子序列转换为离散信号以进行光谱分析,为DNA序列、结构和进化提供了有价值的见解。然而,对于物种分类等任务以及在解释这些光谱以识别有区分性的DNA区域时,可变长度序列的光谱表示仍然存在挑战。

结果

我们引入了SpecGMM,这是一个新颖的框架,它将基于滑动窗口的光谱分析与高斯混合模型相结合,将可变长度的DNA序列转换为固定维度的光谱表示用于分类学分类。SpecGMM的超参数是使用植物序列数据集选择的,并在包括线粒体DNA、病毒和细菌基因组以及16S rRNA序列在内的各种数据集中保持不变应用。在这些数据集中,SpecGMM优于基线方法,对于线性判别分类器,测试准确率平均提高9.45%,最大提高35.55%。关于可解释性,SpecGMM揭示了16S rRNA序列中的有区分性的高变区域——特别是用于区分高级分类群的V3/V4和用于区分低级分类群的V2/V3——证实了它们已知的分类相关性。SpecGMM的频谱图视频分析有助于可视化物种特异性的DNA特征。因此,SpecGMM为光谱DNA分析提供了一种强大且可解释的方法,为GSP研究开辟了新途径。

可用性和实现

SpecGMM的源代码可在https://github.com/BIRDSgroup/SpecGMM上获取。

相似文献

1
SpecGMM: Integrating Spectral analysis and Gaussian Mixture Models for taxonomic classification and identification of discriminative DNA regions.SpecGMM:将光谱分析与高斯混合模型相结合用于分类学分类和鉴别性DNA区域的识别。
Bioinform Adv. 2024 Nov 5;4(1):vbae171. doi: 10.1093/bioadv/vbae171. eCollection 2024.
2
The Identification of Discriminating Patterns from 16S rRNA Gene to Generate Signature for Bacillus Genus.从16S rRNA基因中识别区分模式以生成芽孢杆菌属的特征标记
J Comput Biol. 2016 Aug;23(8):651-61. doi: 10.1089/cmb.2016.0002. Epub 2016 Apr 22.
3
Determining the most accurate 16S rRNA hypervariable region for taxonomic identification from respiratory samples.确定用于呼吸样本分类鉴定的最准确的 16S rRNA 高变区。
Sci Rep. 2023 Mar 9;13(1):3974. doi: 10.1038/s41598-023-30764-z.
4
Construction & assessment of a unified curated reference database for improving the taxonomic classification of bacteria using 16S rRNA sequence data.构建和评估统一的经过精心整理的参考数据库,以提高使用 16S rRNA 序列数据的细菌分类学分类。
Indian J Med Res. 2020 Jan;151(1):93-103. doi: 10.4103/ijmr.IJMR_220_18.
5
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.
6
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.一种用于16S rRNA基因序列的贝叶斯分类方法,具有更高的物种水平准确性。
BMC Bioinformatics. 2017 May 10;18(1):247. doi: 10.1186/s12859-017-1670-4.
7
To compare the performance of prokaryotic taxonomy classifiers using curated 16S full-length rRNA sequences.比较使用经过整理的 16S 全长 rRNA 序列的原核分类器的性能。
Comput Biol Med. 2022 Jun;145:105416. doi: 10.1016/j.compbiomed.2022.105416. Epub 2022 Mar 17.
8
CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences.CGRclust:用于未标记DNA序列双对比聚类的混沌游戏表示法
BMC Genomics. 2024 Dec 18;25(1):1214. doi: 10.1186/s12864-024-11135-y.
9
Comparison of the full-length sequence and sub-regions of 16S rRNA gene for skin microbiome profiling.比较 16S rRNA 基因全长序列和亚区在皮肤微生物组分析中的应用。
mSystems. 2024 Jul 23;9(7):e0039924. doi: 10.1128/msystems.00399-24. Epub 2024 Jun 27.
10
Effect of the 16S rRNA Gene Hypervariable Region on the Microbiome Taxonomic Profile and Diversity in the Endangered Fish .16S rRNA基因高变区对濒危鱼类微生物群落分类特征和多样性的影响
Microorganisms. 2024 Oct 23;12(11):2119. doi: 10.3390/microorganisms12112119.

本文引用的文献

1
Deep Learning for Genomics: From Early Neural Nets to Modern Large Language Models.深度学习在基因组学中的应用:从早期神经网络到现代大型语言模型。
Int J Mol Sci. 2023 Nov 1;24(21):15858. doi: 10.3390/ijms242115858.
2
GraphPart: homology partitioning for biological sequence analysis.GraphPart:用于生物序列分析的同源性划分
NAR Genom Bioinform. 2023 Oct 16;5(4):lqad088. doi: 10.1093/nargab/lqad088. eCollection 2023 Dec.
3
16S-ITGDB: An Integrated Database for Improving Species Classification of Prokaryotic 16S Ribosomal RNA Sequences.
16S-ITGDB:一个用于改进原核生物16S核糖体RNA序列物种分类的综合数据库。
Front Bioinform. 2022 Aug 3;2:905489. doi: 10.3389/fbinf.2022.905489. eCollection 2022.
4
Obtaining genetics insights from deep learning via explainable artificial intelligence.通过可解释人工智能从深度学习中获取遗传学见解。
Nat Rev Genet. 2023 Feb;24(2):125-137. doi: 10.1038/s41576-022-00532-2. Epub 2022 Oct 3.
5
Metagenome analysis using the Kraken software suite.基于 Kraken 软件套件的宏基因组分析。
Nat Protoc. 2022 Dec;17(12):2815-2839. doi: 10.1038/s41596-022-00738-y. Epub 2022 Sep 28.
6
A review of deep learning applications in human genomics using next-generation sequencing data.深度学习在人类基因组学中应用的研究进展:利用下一代测序数据
Hum Genomics. 2022 Jul 25;16(1):26. doi: 10.1186/s40246-022-00396-x.
7
Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study.利用内在基因组特征进行机器学习,快速分类新型病原体:COVID-19 案例研究。
PLoS One. 2020 Apr 24;15(4):e0232391. doi: 10.1371/journal.pone.0232391. eCollection 2020.
8
Improved metagenomic analysis with Kraken 2.Kraken 2 提升宏基因组分析。
Genome Biol. 2019 Nov 28;20(1):257. doi: 10.1186/s13059-019-1891-0.
9
Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.使用QIIME 2进行可重复、交互式、可扩展和可延伸的微生物组数据科学研究。
Nat Biotechnol. 2019 Aug;37(8):852-857. doi: 10.1038/s41587-019-0209-9.
10
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.