Suppr超能文献

信息论宏基因组组装助力在人类微生物组中发现疾病生物标志物。

Information Theoretic Metagenome Assembly Allows the Discovery of Disease Biomarkers in Human Microbiome.

作者信息

Nalbantoglu O Ufuk

机构信息

Department of Computer Engineering, Erciyes University, 38039 Kayseri, Turkey.

Genome and Stem Cell Center, Erciyes University, 38039 Kayseri, Turkey.

出版信息

Entropy (Basel). 2021 Feb 2;23(2):187. doi: 10.3390/e23020187.

Abstract

Quantitative metagenomics is an important field that has delivered successful microbiome biomarkers associated with host phenotypes. The current convention mainly depends on unsupervised assembly of metagenomic contigs with a possibility of leaving interesting genetic material unassembled. Additionally, biomarkers are commonly defined on the differential relative abundance of compositional or functional units. Accumulating evidence supports that microbial genetic variations are as important as the differential abundance content, implying the need for novel methods accounting for the genetic variations in metagenomics studies. We propose an information theoretic metagenome assembly algorithm, discovering genomic fragments with maximal self-information, defined by the empirical distributions of nucleotides across the phenotypes and quantified with the help of statistical tests. Our algorithm infers fragments populating the most informative genetic variants in a single contig, named supervariant fragments. Experiments on simulated metagenomes, as well as on a colorectal cancer and an atherosclerotic cardiovascular disease dataset consistently discovered sequences strongly associated with the disease phenotypes. Moreover, the discriminatory power of these putative biomarkers was mainly attributed to the genetic variations rather than relative abundance. Our results support that a focus on metagenomics methods considering microbiome population genetics might be useful in discovering disease biomarkers with a great potential of translating to molecular diagnostics and biotherapeutics applications.

摘要

定量宏基因组学是一个重要领域,它已成功发现了与宿主表型相关的微生物组生物标志物。当前的惯例主要依赖于宏基因组重叠群的无监督组装,这有可能导致有趣的遗传物质未被组装。此外,生物标志物通常是根据组成或功能单元的差异相对丰度来定义的。越来越多的证据表明,微生物遗传变异与差异丰度内容同样重要,这意味着在宏基因组学研究中需要新的方法来考虑遗传变异。我们提出了一种信息论宏基因组组装算法,该算法通过跨表型的核苷酸经验分布来发现具有最大自信息的基因组片段,并借助统计检验进行量化。我们的算法推断出单个重叠群中包含信息最丰富的遗传变异的片段,即超变异片段。在模拟宏基因组以及结直肠癌和动脉粥样硬化性心血管疾病数据集上的实验一致发现了与疾病表型密切相关的序列。此外,这些假定生物标志物的鉴别能力主要归因于遗传变异而非相对丰度。我们的结果支持,关注考虑微生物群体遗传学的宏基因组学方法可能有助于发现具有转化为分子诊断和生物治疗应用巨大潜力的疾病生物标志物。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1b/7913240/3752b9111200/entropy-23-00187-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验