Suppr超能文献

TaxoNN:基于分层微生物组数据的神经网络集成用于疾病预测。

TaxoNN: ensemble of neural networks on stratified microbiome data for disease prediction.

机构信息

Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada M5T 3M7.

Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada, M5G 1X8.

出版信息

Bioinformatics. 2020 Nov 1;36(17):4544-4550. doi: 10.1093/bioinformatics/btaa542.

Abstract

MOTIVATION

Research supports the potential use of microbiome as a predictor of some diseases. Motivated by the findings that microbiome data is complex in nature, and there is an inherent correlation due to hierarchical taxonomy of microbial Operational Taxonomic Units (OTUs), we propose a novel machine learning method incorporating a stratified approach to group OTUs into phylum clusters. Convolutional Neural Networks (CNNs) were used to train within each of the clusters individually. Further, through an ensemble learning approach, features obtained from each cluster were then concatenated to improve prediction accuracy. Our two-step approach comprising stratification prior to combining multiple CNNs, aided in capturing the relationships between OTUs sharing a phylum efficiently, as compared to using a single CNN ignoring OTU correlations.

RESULTS

We used simulated datasets containing 168 OTUs in 200 cases and 200 controls for model testing. Thirty-two OTUs, potentially associated with risk of disease were randomly selected and interactions between three OTUs were used to introduce non-linearity. We also implemented this novel method in two human microbiome studies: (i) Cirrhosis with 118 cases, 114 controls; (ii) type 2 diabetes (T2D) with 170 cases, 174 controls; to demonstrate the model's effectiveness. Extensive experimentation and comparison against conventional machine learning techniques yielded encouraging results. We obtained mean AUC values of 0.88, 0.92, 0.75, showing a consistent increment (5%, 3%, 7%) in simulations, Cirrhosis and T2D data, respectively, against the next best performing method, Random Forest.

AVAILABILITY AND IMPLEMENTATION

https://github.com/divya031090/TaxoNN_OTU.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

研究支持将微生物组作为某些疾病预测因子的潜力。鉴于微生物组数据本质上较为复杂,并且由于微生物操作分类单元 (OTU) 的层次分类法存在固有相关性,我们提出了一种新的机器学习方法,该方法采用分层方法将 OTU 分组为门聚类。使用卷积神经网络 (CNN) 分别在每个聚类中进行训练。此外,通过集成学习方法,然后将从每个聚类中获得的特征串联起来,以提高预测准确性。我们的两步方法包括分层,然后结合多个 CNN,有助于有效地捕捉具有相同门的 OTU 之间的关系,而不是使用忽略 OTU 相关性的单个 CNN。

结果

我们使用包含 200 例和 200 例对照的 168 个 OTU 的模拟数据集进行模型测试。随机选择 32 个可能与疾病风险相关的 OTU,并使用三个 OTU 之间的相互作用引入非线性。我们还在两项人类微生物组研究中实施了这种新方法:(i)肝硬化,118 例,114 例对照;(ii)2 型糖尿病(T2D),170 例,174 例对照;以证明模型的有效性。经过广泛的实验和与传统机器学习技术的比较,得出了令人鼓舞的结果。我们在模拟、肝硬化和 T2D 数据中分别获得了 0.88、0.92、0.75 的平均 AUC 值,与下一个表现最佳的方法(随机森林)相比,分别有 5%、3%、7%的一致性增长。

可用性和实现

https://github.com/divya031090/TaxoNN_OTU。

补充信息

补充数据可在生物信息学在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a976/7750934/aff390b2f954/btaa542f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验