• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于局部敏感哈希的 k- -mer 聚类用于鉴定与宿主表型相关的差异微生物标记物。

Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype.

机构信息

Computer Science Department, Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana, USA.

出版信息

J Comput Biol. 2022 Jul;29(7):738-751. doi: 10.1089/cmb.2021.0640. Epub 2022 May 17.

DOI:10.1089/cmb.2021.0640
PMID:35584271
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9464365/
Abstract

Microbial organisms play important roles in many aspects of human health and diseases. Encouraged by the numerous studies that show the association between microbiomes and human diseases, computational and machine learning methods have been recently developed to generate and utilize microbiome features for prediction of host phenotypes such as disease versus healthy cancer immunotherapy responder versus nonresponder. We have previously developed a approach, which focuses on extraction and assembly of differential reads from metagenomic data sets that are likely sampled from differential genomes or genes between two groups of microbiome data sets (e.g., healthy vs. disease). In this article, we further improved our subtractive assembly approach by utilizing groups of k-mers with similar abundance profiles across multiple samples. We implemented a locality-sensitive hashing (LSH)-enabled approach (called kmerLSHSA) to group billions of k-mers into (kCAGs), which were subsequently used for the retrieval of kCAGs for subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets and real microbiome data sets showed that, compared with the conventional approach that utilizes genes, our approach can quickly identify differential genes that can be used for building promising predictive models for microbiome-based host phenotype prediction. We also discussed other potential applications of LSH-enabled clustering of k-mers according to their abundance profiles across multiple microbiome samples.

摘要

微生物在人类健康和疾病的许多方面发挥着重要作用。受大量研究表明微生物组与人类疾病之间存在关联的鼓舞,最近已经开发出计算和机器学习方法,以生成和利用微生物组特征来预测宿主表型,例如疾病与健康、癌症免疫治疗应答者与非应答者。我们之前开发了一种方法,该方法侧重于从宏基因组数据集(可能是从两组微生物组数据集(例如,健康与疾病)之间的差异基因组或基因中采样)中提取和组装差异reads。在本文中,我们通过利用在多个样本中具有相似丰度分布的多组 k-mer 进一步改进了我们的减法组装方法。我们实现了一种基于局部敏感哈希(LSH)的方法(称为 kmerLSHSA),将数十亿个 k-mer 分组为 (kCAGs),随后用于检索用于减法组装的 kCAGs。在模拟数据集和真实微生物组数据集上对 kmerLSHSA 方法的测试表明,与利用 基因的传统方法相比,我们的方法可以快速识别可用于构建基于微生物组的宿主表型预测的有前途的预测模型的差异基因。我们还根据多个微生物组样本中它们的丰度分布讨论了 LSH 增强的 k-mer 聚类的其他潜在应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/079da7cc6f13/cmb.2021.0640_figure5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/868fa74cc141/cmb.2021.0640_figure1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/f41eb49064b1/cmb.2021.0640_figure2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/3a554e706240/cmb.2021.0640_figure3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/0afafd2bd03e/cmb.2021.0640_figure4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/079da7cc6f13/cmb.2021.0640_figure5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/868fa74cc141/cmb.2021.0640_figure1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/f41eb49064b1/cmb.2021.0640_figure2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/3a554e706240/cmb.2021.0640_figure3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/0afafd2bd03e/cmb.2021.0640_figure4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a76/9464365/079da7cc6f13/cmb.2021.0640_figure5.jpg

相似文献

1
Locality-Sensitive Hashing-Based k-Mer Clustering for Identification of Differential Microbial Markers Related to Host Phenotype.基于局部敏感哈希的 k- -mer 聚类用于鉴定与宿主表型相关的差异微生物标记物。
J Comput Biol. 2022 Jul;29(7):738-751. doi: 10.1089/cmb.2021.0640. Epub 2022 May 17.
2
A concurrent subtractive assembly approach for identification of disease associated sub-metagenomes.一种用于识别疾病相关亚宏基因组的并行减法组装方法。
Res Comput Mol Biol. 2017;2017:18-33. doi: 10.1007/978-3-319-56970-3_2. Epub 2017 Apr 12.
3
Phenotype Prediction from Metagenomic Data Using Clustering and Assembly with Multiple Instance Learning (CAMIL).基于聚类和多重实例学习组装的宏基因组数据表型预测(CAMIL)。
IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):828-840. doi: 10.1109/TCBB.2017.2758782. Epub 2017 Oct 4.
4
A repository of microbial marker genes related to human health and diseases for host phenotype prediction using microbiome data.一个与人类健康和疾病相关的微生物标记基因库,用于利用微生物组数据预测宿主表型。
Pac Symp Biocomput. 2019;24:236-247.
5
Assessment of k-mer spectrum applicability for metagenomic dissimilarity analysis.用于宏基因组差异分析的k-mer谱适用性评估。
BMC Bioinformatics. 2016 Jan 16;17:38. doi: 10.1186/s12859-015-0875-7.
6
Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes.用于比较宏基因组学的消减组装及其在2型糖尿病宏基因组中的应用。
Genome Biol. 2015 Nov 2;16:243. doi: 10.1186/s13059-015-0804-0.
7
CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing.CONSULT-II:基于位置敏感哈希的准确分类鉴定和特征分析。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae150.
8
16S rRNA metagenome clustering and diversity estimation using locality sensitive hashing.使用局部敏感哈希进行16S rRNA宏基因组聚类和多样性估计。
BMC Syst Biol. 2013;7 Suppl 4(Suppl 4):S11. doi: 10.1186/1752-0509-7-S4-S11. Epub 2013 Oct 23.
9
Application of machine learning techniques for creating urban microbial fingerprints.应用机器学习技术构建城市微生物指纹图谱。
Biol Direct. 2019 Aug 16;14(1):13. doi: 10.1186/s13062-019-0245-x.
10
Intestinal microbiota domination under extreme selective pressures characterized by metagenomic read cloud sequencing and assembly.肠道微生物群落在具有宏基因组读段云测序和组装特征的极端选择压力下占主导地位。
BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):585. doi: 10.1186/s12859-019-3073-1.

引用本文的文献

1
Incorporating metabolic activity, taxonomy and community structure to improve microbiome-based predictive models for host phenotype prediction.将代谢活性、分类学和群落结构纳入其中,以改进基于微生物组的预测模型,从而预测宿主表型。
Gut Microbes. 2024 Jan-Dec;16(1):2302076. doi: 10.1080/19490976.2024.2302076. Epub 2024 Jan 12.

本文引用的文献

1
Microbiota and Colorectal Cancer: From Gut to Bedside.微生物群与结直肠癌:从肠道到床边
Front Pharmacol. 2021 Sep 30;12:760280. doi: 10.3389/fphar.2021.760280. eCollection 2021.
2
Human reference gut microbiome catalog including newly assembled genomes from under-represented Asian metagenomes.人类参考肠道微生物组目录,包括来自代表性不足的亚洲宏基因组的新组装基因组。
Genome Med. 2021 Aug 27;13(1):134. doi: 10.1186/s13073-021-00950-7.
3
Comparative study of classifiers for human microbiome data.人类微生物组数据分类器的比较研究
Med Microecol. 2020 Jun;4. doi: 10.1016/j.medmic.2020.100013. Epub 2020 May 11.
4
Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox.通过 SIAMCAT 机器学习工具箱进行微生物组荟萃分析和跨疾病比较。
Genome Biol. 2021 Mar 30;22(1):93. doi: 10.1186/s13059-021-02306-1.
5
The human tumor microbiome is composed of tumor type-specific intracellular bacteria.人类肿瘤微生物组由肿瘤类型特异性的内共生细菌组成。
Science. 2020 May 29;368(6494):973-980. doi: 10.1126/science.aay9189.
6
A systematic machine learning and data type comparison yields metagenomic predictors of infant age, sex, breastfeeding, antibiotic usage, country of origin, and delivery type.系统的机器学习和数据类型比较产生了宏基因组预测因子,可以预测婴儿的年龄、性别、母乳喂养、使用抗生素、原籍国和分娩方式。
PLoS Comput Biol. 2020 May 11;16(5):e1007895. doi: 10.1371/journal.pcbi.1007895. eCollection 2020 May.
7
DeepMicro: deep representation learning for disease prediction based on microbiome data.深微:基于微生物组数据的疾病预测的深度学习表示。
Sci Rep. 2020 Apr 7;10(1):6026. doi: 10.1038/s41598-020-63159-5.
8
Microbiome analyses of blood and tissues suggest cancer diagnostic approach.血液和组织的微生物组分析提示癌症诊断方法。
Nature. 2020 Mar;579(7800):567-574. doi: 10.1038/s41586-020-2095-1. Epub 2020 Mar 11.
9
Mash Screen: high-throughput sequence containment estimation for genome discovery.Mash 屏幕:用于基因组发现的高通量序列包含度估计。
Genome Biol. 2019 Nov 5;20(1):232. doi: 10.1186/s13059-019-1841-x.
10
Locality-sensitive hashing for the edit distance.基于编辑距离的位置敏感哈希
Bioinformatics. 2019 Jul 15;35(14):i127-i135. doi: 10.1093/bioinformatics/btz354.