• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于生命之树的复合通用DNA特征。

A composite universal DNA signature for the tree of life.

作者信息

de Medeiros Bruno A S, Cai Liming, Flynn Peter J, Yan Yujing, Duan Xiaoshan, Marinho Lucas C, Anderson Christiane, Davis Charles C

机构信息

Field Museum of Natural History, Chicago, IL, USA.

Department of Organismic and Evolutionary Biology, Museum of Comparative Zoology, Harvard University, Cambridge, MA, USA.

出版信息

Nat Ecol Evol. 2025 Jun 25. doi: 10.1038/s41559-025-02752-1.

DOI:10.1038/s41559-025-02752-1
PMID:40562837
Abstract

Species identification using DNA barcodes has revolutionized biodiversity sciences. However, conventional barcoding methods may lack power and universal applicability across the tree of life. Alternative methods based on whole genome sequencing are hard to scale due to large data requirements. Here we develop a novel DNA-based identification method, varKoding, using exceptionally low-coverage genome skim data to create two-dimensional images representing the genomic signature of a species. Using these representations, we train neural networks for taxonomic identification. Applying a taxonomically verified novel genomic dataset of Malpighiales plant accessions, we optimize training hyperparameters and find the highest performance by combining a transformer architecture with a new modified chaos game representation. Greater than 91% precision is achieved despite minimal input data, exceeding alternative methods tested. We illustrate the broad utility of varKoding across several focal clades of eukaryotes and prokaryotes. We also train a model capable of identifying all species in the Sequence Read Archive of the National Center for Biotechnology Information using less than 10 Mbp sequencing data with 96% precision and 95% recall and robust to sequencing platforms. The varKoding approach offers enhanced computational efficiency and scalability, minimal data inputs robust to sequencing details and modularity for further development in biodiversity science.

摘要

使用DNA条形码进行物种鉴定彻底改变了生物多样性科学。然而,传统的条形码方法可能缺乏效力,且在整个生命之树上缺乏普遍适用性。基于全基因组测序的替代方法由于数据需求大而难以扩展。在此,我们开发了一种新的基于DNA的鉴定方法——varKoding,利用极低覆盖度的基因组重测序数据创建代表物种基因组特征的二维图像。利用这些表示,我们训练神经网络进行分类鉴定。应用经过分类验证的锦葵目植物种质的新基因组数据集,我们优化训练超参数,并通过将变压器架构与新的改进混沌游戏表示相结合,找到了最高性能。尽管输入数据极少,但精度仍超过91%,超过了所测试的替代方法。我们展示了varKoding在真核生物和原核生物的几个重点分支中的广泛实用性。我们还训练了一个模型,该模型能够使用少于10兆碱基的测序数据,以96%的精度和95%的召回率识别美国国立生物技术信息中心序列读取存档中的所有物种,并且对测序平台具有鲁棒性。varKoding方法提供了更高的计算效率和可扩展性、对测序细节具有鲁棒性的最少数据输入以及用于生物多样性科学进一步发展的模块化。

相似文献

1
A composite universal DNA signature for the tree of life.一种用于生命之树的复合通用DNA特征。
Nat Ecol Evol. 2025 Jun 25. doi: 10.1038/s41559-025-02752-1.
2
Investigating fungal diversity through metabarcoding for environmental samples: assessment of ITS1 and ITS2 Illumina sequencing using multiple defined mock communities with different classification methods and reference databases.通过宏条形码技术研究环境样本中的真菌多样性:使用多种定义的模拟群落、不同分类方法和参考数据库评估ITS1和ITS2的Illumina测序
BMC Genomics. 2025 Aug 6;26(1):729. doi: 10.1186/s12864-025-11917-y.
3
Improving Whole Biodiversity Monitoring and Discovery With Environmental DNA Metagenomics.利用环境DNA宏基因组学改善整体生物多样性监测与发现
Mol Ecol Resour. 2025 Aug;25(6):e14105. doi: 10.1111/1755-0998.14105. Epub 2025 Apr 1.
4
Automated devices for identifying peripheral arterial disease in people with leg ulceration: an evidence synthesis and cost-effectiveness analysis.用于识别下肢溃疡患者外周动脉疾病的自动化设备:证据综合和成本效益分析。
Health Technol Assess. 2024 Aug;28(37):1-158. doi: 10.3310/TWCG3912.
5
Can a Liquid Biopsy Detect Circulating Tumor DNA With Low-passage Whole-genome Sequencing in Patients With a Sarcoma? A Pilot Evaluation.液体活检能否通过低深度全基因组测序检测肉瘤患者的循环肿瘤DNA?一项初步评估。
Clin Orthop Relat Res. 2025 Jan 1;483(1):39-48. doi: 10.1097/CORR.0000000000003161. Epub 2024 Jun 21.
6
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.
7
Systemic Inflammatory Response Syndrome全身炎症反应综合征
8
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
9
Atraumatic restorative treatment versus conventional restorative treatment for managing dental caries.非创伤性修复治疗与传统修复治疗在龋病管理中的比较
Cochrane Database Syst Rev. 2017 Dec 28;12(12):CD008072. doi: 10.1002/14651858.CD008072.pub2.
10
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.

引用本文的文献

1
Positional frequency chaos game representation for machine learning-based classification of crop lncRNAs.基于机器学习的作物长链非编码RNA分类的位置频率混沌博弈表示法
bioRxiv. 2025 Jun 7:2025.06.03.657533. doi: 10.1101/2025.06.03.657533.

本文引用的文献

1
A curated benchmark dataset for molecular identification based on genome skimming.一个基于基因组浅层测序的用于分子鉴定的精选基准数据集。
Sci Data. 2025 May 29;12(1):906. doi: 10.1038/s41597-025-05230-2.
2
A Snakemake Toolkit for the Batch Assembly, Annotation and Phylogenetic Analysis of Mitochondrial Genomes and Ribosomal Genes From Genome Skims of Museum Collections.一种用于对博物馆馆藏基因组草图中的线粒体基因组和核糖体基因进行批量组装、注释及系统发育分析的Snakemake工具包。
Mol Ecol Resour. 2025 Jan;25(1):e14036. doi: 10.1111/1755-0998.14036. Epub 2024 Oct 28.
3
Collections are truly priceless.
收藏品确实是无价之宝。
Science. 2024 Mar 8;383(6687):1035. doi: 10.1126/science.ado9732. Epub 2024 Mar 7.
4
Medicinal plants meet modern biodiversity science.药用植物与现代生物多样性科学相遇。
Curr Biol. 2024 Feb 26;34(4):R158-R173. doi: 10.1016/j.cub.2023.12.038.
5
A genomic catalogue of soil microbiomes boosts mining of biodiversity and genetic resources.土壤微生物组的基因组目录促进了生物多样性和遗传资源的挖掘。
Nat Commun. 2023 Nov 11;14(1):7318. doi: 10.1038/s41467-023-43000-z.
6
Environment and taxonomy shape the genomic signature of prokaryotic extremophiles.环境和分类学塑造了原核极端微生物的基因组特征。
Sci Rep. 2023 Sep 26;13(1):16105. doi: 10.1038/s41598-023-42518-y.
7
iDeLUCS: a deep learning interactive tool for alignment-free clustering of DNA sequences.iDeLUCS:一种用于 DNA 序列无比对聚类的深度学习交互式工具。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad508.
8
Amazon Biobank: a collaborative genetic database for bioeconomy development.亚马逊生物样本库:一个用于生物经济发展的合作基因数据库。
Funct Integr Genomics. 2023 Mar 25;23(2):101. doi: 10.1007/s10142-023-01015-1.
9
Genomic Signature in Evolutionary Biology: A Review.进化生物学中的基因组特征:综述
Biology (Basel). 2023 Feb 16;12(2):322. doi: 10.3390/biology12020322.
10
Accurate and fast clade assignment via deep learning and frequency chaos game representation.通过深度学习和频率混沌游戏表示实现准确快速的进化枝分配。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giac119.