Sharma Divya, Xu Wei
Biostatistics Department, Princess Margaret Cancer Center, University Health Network, Toronto, ON M5G 2C1, Canada.
Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada.
Bioinformatics. 2021 Nov 5;37(21):3707-3714. doi: 10.1093/bioinformatics/btab482.
Research shows that human microbiome is highly dynamic on longitudinal timescales, changing dynamically with diet, or due to medical interventions. In this article, we propose a novel deep learning framework 'phyLoSTM', using a combination of Convolutional Neural Networks and Long Short Term Memory Networks (LSTM) for feature extraction and analysis of temporal dependency in longitudinal microbiome sequencing data along with host's environmental factors for disease prediction. Additional novelty in terms of handling variable timepoints in subjects through LSTMs, as well as, weight balancing between imbalanced cases and controls is proposed.
We simulated 100 datasets across multiple time points for model testing. To demonstrate the model's effectiveness, we also implemented this novel method into two real longitudinal human microbiome studies: (i) DIABIMMUNE three country cohort with food allergy outcomes (Milk, Egg, Peanut and Overall) and (ii) DiGiulio study with preterm delivery as outcome. Extensive analysis and comparison of our approach yields encouraging performance with an AUC of 0.897 (increased by 5%) on simulated studies and AUCs of 0.762 (increased by 19%) and 0.713 (increased by 8%) on the two real longitudinal microbiome studies respectively, as compared to the next best performing method, Random Forest. The proposed methodology improves predictive accuracy on longitudinal human microbiome studies containing spatially correlated data, and evaluates the change of microbiome composition contributing to outcome prediction.
https://github.com/divya031090/phyLoSTM.
Supplementary data are available at Bioinformatics online.
研究表明,人类微生物群落在纵向时间尺度上具有高度动态性,会随饮食或医学干预而动态变化。在本文中,我们提出了一种新颖的深度学习框架“phyLoSTM”,它结合了卷积神经网络和长短期记忆网络(LSTM),用于在纵向微生物组测序数据以及宿主环境因素中进行特征提取和时间依赖性分析,以预测疾病。此外,还提出了通过LSTM处理受试者中可变时间点的新颖方法,以及在不平衡病例和对照之间进行权重平衡的方法。
我们模拟了100个跨多个时间点的数据集用于模型测试。为了证明该模型的有效性,我们还将这种新方法应用于两项真实的纵向人类微生物组研究中:(i)有食物过敏结果(牛奶、鸡蛋、花生和总体)的三国DIABIMMUNE队列研究,以及(ii)以早产为结果的迪朱利奥研究。与次优的随机森林方法相比,我们方法的广泛分析和比较产生了令人鼓舞的性能,在模拟研究中AUC为0.897(提高了5%),在两项真实的纵向微生物组研究中AUC分别为0.762(提高了19%)和0.713(提高了8%)。所提出的方法提高了对包含空间相关数据的纵向人类微生物组研究的预测准确性,并评估了微生物组组成变化对结果预测的贡献。
https://github.com/divya031090/phyLoSTM。
补充数据可在《生物信息学》在线获取。