• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MicroPheno:使用基于 k -mer 的浅层子样本表示从 16S rRNA 基因测序中预测环境和宿主表型。

MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.

机构信息

Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA.

Molecular Biophysics and Integrated Bioimaging, Lawrence Berkeley National Lab, Berkeley, CA, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i32-i42. doi: 10.1093/bioinformatics/bty296.

DOI:10.1093/bioinformatics/bty296
PMID:29950008
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6022683/
Abstract

MOTIVATION

Microbial communities play important roles in the function and maintenance of various biosystems, ranging from the human body to the environment. A major challenge in microbiome research is the classification of microbial communities of different environments or host phenotypes. The most common and cost-effective approach for such studies to date is 16S rRNA gene sequencing. Recent falls in sequencing costs have increased the demand for simple, efficient and accurate methods for rapid detection or diagnosis with proved applications in medicine, agriculture and forensic science. We describe a reference- and alignment-free approach for predicting environments and host phenotypes from 16S rRNA gene sequencing based on k-mer representations that benefits from a bootstrapping framework for investigating the sufficiency of shallow sub-samples. Deep learning methods as well as classical approaches were explored for predicting environments and host phenotypes.

RESULTS

A k-mer distribution of shallow sub-samples outperformed Operational Taxonomic Unit (OTU) features in the tasks of body-site identification and Crohn's disease prediction. Aside from being more accurate, using k-mer features in shallow sub-samples allows (i) skipping computationally costly sequence alignments required in OTU-picking and (ii) provided a proof of concept for the sufficiency of shallow and short-length 16S rRNA sequencing for phenotype prediction. In addition, k-mer features predicted representative 16S rRNA gene sequences of 18 ecological environments, and 5 organismal environments with high macro-F1 scores of 0.88 and 0.87. For large datasets, deep learning outperformed classical methods such as Random Forest and Support Vector Machine.

AVAILABILITY AND IMPLEMENTATION

The software and datasets are available at https://llp.berkeley.edu/micropheno.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

微生物群落在各种生物系统(从人体到环境)的功能和维持中发挥着重要作用。微生物组研究的一个主要挑战是对不同环境或宿主表型的微生物群落进行分类。迄今为止,此类研究最常见和最具成本效益的方法是 16S rRNA 基因测序。最近测序成本的下降增加了对简单、高效和准确方法的需求,这些方法已在医学、农业和法医学中得到证明应用,可用于快速检测或诊断。我们描述了一种基于 k-mer 表示的参考和无比对方法,用于根据 16S rRNA 基因测序预测环境和宿主表型,该方法受益于用于调查浅层子样本充足性的自举框架。我们探索了深度学习方法和经典方法来预测环境和宿主表型。

结果

在体定位识别和克罗恩病预测任务中,浅层子样本的 k-mer 分布优于操作分类单元 (OTU) 特征。除了更准确之外,在浅层子样本中使用 k-mer 特征还可以(i)跳过在 OTU 选择中需要的计算成本高昂的序列比对,(ii)为用于表型预测的短长度 16S rRNA 测序的浅层和短长度的充分性提供了概念验证。此外,k-mer 特征预测了 18 个生态环境和 5 个生物环境的代表性 16S rRNA 基因序列,其宏 F1 分数分别为 0.88 和 0.87。对于大型数据集,深度学习优于随机森林和支持向量机等经典方法。

可用性和实现

软件和数据集可在 https://llp.berkeley.edu/micropheno 上获得。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/7078de7318e5/bty296f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/aaf9baa74551/bty296f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/5fcc99448c5f/bty296f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/f0695b408e1e/bty296f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/a802dd10d8f6/bty296f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/e43850b33626/bty296f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/7078de7318e5/bty296f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/aaf9baa74551/bty296f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/5fcc99448c5f/bty296f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/f0695b408e1e/bty296f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/a802dd10d8f6/bty296f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/e43850b33626/bty296f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8307/6022683/7078de7318e5/bty296f6.jpg

相似文献

1
MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.MicroPheno:使用基于 k -mer 的浅层子样本表示从 16S rRNA 基因测序中预测环境和宿主表型。
Bioinformatics. 2018 Jul 1;34(13):i32-i42. doi: 10.1093/bioinformatics/bty296.
2
DiTaxa: nucleotide-pair encoding of 16S rRNA for host phenotype and biomarker detection.迪塔克萨:用于宿主表型和生物标志物检测的 16S rRNA 的核苷酸对编码。
Bioinformatics. 2019 Jul 15;35(14):2498-2500. doi: 10.1093/bioinformatics/bty954.
3
Updating the 97% identity threshold for 16S ribosomal RNA OTUs.更新 16S 核糖体 RNA OTUs 的 97%同一性阈值。
Bioinformatics. 2018 Jul 15;34(14):2371-2375. doi: 10.1093/bioinformatics/bty113.
4
MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.MicroPheno:使用基于k-mer的浅层子样本表示法从16S rRNA基因测序预测环境和宿主表型。
Bioinformatics. 2019 Mar 15;35(6):1082. doi: 10.1093/bioinformatics/bty652.
5
16S rRNA sequence embeddings: Meaningful numeric feature representations of nucleotide sequences that are convenient for downstream analyses.16S rRNA 序列嵌入:核苷酸序列有意义的数值特征表示形式,方便下游分析。
PLoS Comput Biol. 2019 Feb 26;15(2):e1006721. doi: 10.1371/journal.pcbi.1006721. eCollection 2019 Feb.
6
MetaSquare: an integrated metadatabase of 16S rRNA gene amplicon for microbiome taxonomic classification.MetaSquare:一个整合了 16S rRNA 基因扩增子的元数据库,用于微生物组分类学分类。
Bioinformatics. 2022 May 13;38(10):2930-2931. doi: 10.1093/bioinformatics/btac184.
7
A comprehensive evaluation of the sl1p pipeline for 16S rRNA gene sequencing analysis.SL1p 管道用于 16S rRNA 基因测序分析的综合评估。
Microbiome. 2017 Aug 14;5(1):100. doi: 10.1186/s40168-017-0314-2.
8
Piphillin predicts metagenomic composition and dynamics from DADA2-corrected 16S rDNA sequences.Piphillin 可根据 DADA2 校正的 16S rDNA 序列预测宏基因组组成和动态。
BMC Genomics. 2020 Jan 17;21(1):56. doi: 10.1186/s12864-019-6427-1.
9
OTUX: V-region specific OTU database for improved 16S rRNA OTU picking and efficient cross-study taxonomic comparison of microbiomes.OTUX:用于改进 16S rRNA OTU 挑选和微生物组跨研究分类比较的 V 区特有序列 OTU 数据库。
DNA Res. 2019 Apr 1;26(2):147-156. doi: 10.1093/dnares/dsy045.
10
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy.一种用于16S rRNA基因序列的贝叶斯分类方法,具有更高的物种水平准确性。
BMC Bioinformatics. 2017 May 10;18(1):247. doi: 10.1186/s12859-017-1670-4.

引用本文的文献

1
Integrating sequence composition information into microbial diversity analyses with k-mer frequency counting.通过k-mer频率计数将序列组成信息整合到微生物多样性分析中。
mSystems. 2025 Mar 18;10(3):e0155024. doi: 10.1128/msystems.01550-24. Epub 2025 Feb 20.
2
Deep learning in microbiome analysis: a comprehensive review of neural network models.微生物组分析中的深度学习:神经网络模型综述
Front Microbiol. 2025 Jan 22;15:1516667. doi: 10.3389/fmicb.2024.1516667. eCollection 2024.
3
A deep learning feature importance test framework for integrating informative high-dimensional biomarkers to improve disease outcome prediction.

本文引用的文献

1
The Madness of Microbiome: Attempting To Find Consensus "Best Practice" for 16S Microbiome Studies.微生物组的疯狂:试图为 16S 微生物组研究找到共识的“最佳实践”。
Appl Environ Microbiol. 2018 Mar 19;84(7). doi: 10.1128/AEM.02627-17. Print 2018 Apr 1.
2
Data-driven advice for applying machine learning to bioinformatics problems.将机器学习应用于生物信息学问题的基于数据的建议。
Pac Symp Biocomput. 2018;23:192-203.
3
Meta-analysis of gut microbiome studies identifies disease-specific and shared responses.基于宏基因组关联研究的肠道微生物组分析鉴定出疾病特异性和共享反应。
一种用于整合信息丰富的高维生物标志物以改善疾病预后预测的深度学习特征重要性测试框架。
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae709.
4
Metabolic interactions shape emergent biofilm structures in a conceptual model of gut mucosal bacterial communities.在肠道黏膜细菌群落的概念模型中,代谢相互作用塑造了新出现的生物膜结构。
NPJ Biofilms Microbiomes. 2024 Oct 2;10(1):99. doi: 10.1038/s41522-024-00572-y.
5
Deep learning methods in metagenomics: a review.元基因组学中的深度学习方法:综述。
Microb Genom. 2024 Apr;10(4). doi: 10.1099/mgen.0.001231.
6
NN-RNALoc: Neural network-based model for prediction of mRNA sub-cellular localization using distance-based sub-sequence profiles.基于神经网络的模型,使用基于距离的子序列谱预测 mRNA 亚细胞定位。
PLoS One. 2023 Sep 14;18(9):e0258793. doi: 10.1371/journal.pone.0258793. eCollection 2023.
7
Crohn's Disease Prediction Using Sequence Based Machine Learning Analysis of Human Microbiome.基于序列的人类微生物组机器学习分析预测克罗恩病
Diagnostics (Basel). 2023 Sep 1;13(17):2835. doi: 10.3390/diagnostics13172835.
8
MV-CVIB: microbiome-based multi-view convolutional variational information bottleneck for predicting metastatic colorectal cancer.MV-CVIB:基于微生物组的多视图卷积变分信息瓶颈用于预测转移性结直肠癌
Front Microbiol. 2023 Aug 22;14:1238199. doi: 10.3389/fmicb.2023.1238199. eCollection 2023.
9
Automatic disease prediction from human gut metagenomic data using boosting GraphSAGE.基于提升图抽样的人类肠道宏基因组数据自动疾病预测。
BMC Bioinformatics. 2023 Mar 31;24(1):126. doi: 10.1186/s12859-023-05251-x.
10
A convenient correspondence between k-mer-based metagenomic distances and phylogenetically-informed β-diversity measures.基于 k-mer 的宏基因组距离与基于系统发育信息的 β 多样性测度之间的便捷对应关系。
PLoS Comput Biol. 2023 Jan 6;19(1):e1010821. doi: 10.1371/journal.pcbi.1010821. eCollection 2023 Jan.
Nat Commun. 2017 Dec 5;8(1):1784. doi: 10.1038/s41467-017-01973-8.
4
Targeted sequencing of clade-specific markers from skin microbiomes for forensic human identification.用于法医人类识别的皮肤微生物群落进化枝特异性标记物的靶向测序。
Forensic Sci Int Genet. 2018 Jan;32:50-61. doi: 10.1016/j.fsigen.2017.10.004. Epub 2017 Oct 18.
5
Embracing the unknown: disentangling the complexities of the soil microbiome.拥抱未知:解开土壤微生物组的复杂性。
Nat Rev Microbiol. 2017 Oct;15(10):579-590. doi: 10.1038/nrmicro.2017.87. Epub 2017 Aug 21.
6
A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity.基于序列相似性的16S rRNA操作分类单元聚类的观点
NPJ Biofilms Microbiomes. 2016 Apr 20;2:16004. doi: 10.1038/npjbiofilms.2016.4. eCollection 2016.
7
Cutaneous Leishmaniasis Induces a Transmissible Dysbiotic Skin Microbiota that Promotes Skin Inflammation.皮肤利什曼病诱导出一种可传播的失调皮肤微生物群,该微生物群会促进皮肤炎症。
Cell Host Microbe. 2017 Jul 12;22(1):13-24.e4. doi: 10.1016/j.chom.2017.06.006. Epub 2017 Jun 29.
8
Predicting the Ecological Quality Status of Marine Environments from eDNA Metabarcoding Data Using Supervised Machine Learning.利用有监督机器学习从宏条形码数据预测海洋环境的生态质量状况。
Environ Sci Technol. 2017 Aug 15;51(16):9118-9126. doi: 10.1021/acs.est.7b01518. Epub 2017 Jul 25.
9
Dysbiosis in chronic periodontitis: Key microbial players and interactions with the human host.慢性牙周炎中的菌群失调:关键微生物及其与宿主的相互作用。
Sci Rep. 2017 Jun 16;7(1):3703. doi: 10.1038/s41598-017-03804-8.
10
ESPRIT-Forest: Parallel clustering of massive amplicon sequence data in subquadratic time.ESPRIT-Forest:在亚二次时间内对海量扩增子序列数据进行并行聚类
PLoS Comput Biol. 2017 Apr 24;13(4):e1005518. doi: 10.1371/journal.pcbi.1005518. eCollection 2017 Apr.