Department of Computer Science, Columbia University, 1214 Amsterdam Ave., Mailcode 0401, New York, 10027, USA.
Department of Pediatrics, University of Pittsburgh School of Medicine, Pittsburgh, USA.
BMC Bioinformatics. 2022 Mar 25;23(1):104. doi: 10.1186/s12859-022-04618-w.
Necrotizing enterocolitis (NEC) is a common, potentially catastrophic intestinal disease among very low birthweight premature infants. Affecting up to 15% of neonates born weighing less than 1500 g, NEC causes sudden-onset, progressive intestinal inflammation and necrosis, which can lead to significant bowel loss, multi-organ injury, or death. No unifying cause of NEC has been identified, nor is there any reliable biomarker that indicates an individual patient's risk of the disease. Without a way to predict NEC in advance, the current medical strategy involves close clinical monitoring in an effort to treat babies with NEC as quickly as possible before irrecoverable intestinal damage occurs. In this report, we describe a novel machine learning application for generating dynamic, individualized NEC risk scores based on intestinal microbiota data, which can be determined from sequencing bacterial DNA from otherwise discarded infant stool. A central insight that differentiates our work from past efforts was the recognition that disease prediction from stool microbiota represents a specific subtype of machine learning problem known as multiple instance learning (MIL).
We used a neural network-based MIL architecture, which we tested on independent datasets from two cohorts encompassing 3595 stool samples from 261 at-risk infants. Our report also introduces a new concept called the "growing bag" analysis, which applies MIL over time, allowing incorporation of past data into each new risk calculation. This approach allowed early, accurate NEC prediction, with a mean sensitivity of 86% and specificity of 90%. True-positive NEC predictions occurred an average of 8 days before disease onset. We also demonstrate that an attention-gated mechanism incorporated into our MIL algorithm permits interpretation of NEC risk, identifying several bacterial taxa that past work has associated with NEC, and potentially pointing the way toward new hypotheses about NEC pathogenesis. Our system is flexible, accepting microbiota data generated from targeted 16S or "shotgun" whole-genome DNA sequencing. It performs well in the setting of common, potentially confounding preterm neonatal clinical events such as perinatal cardiopulmonary depression, antibiotic administration, feeding disruptions, or transitions between breast feeding and formula.
We have developed and validated a robust MIL-based system for NEC prediction from harmlessly collected premature infant stool. While this system was developed for NEC prediction, our MIL approach may also be applicable to other diseases characterized by changes in the human microbiota.
坏死性小肠结肠炎(NEC)是极低出生体重早产儿中一种常见的、潜在灾难性的肠道疾病。影响到出生体重小于 1500 克的新生儿的 15%,NEC 导致突发性、进行性的肠道炎症和坏死,可能导致大量肠损失、多器官损伤或死亡。目前还没有发现 NEC 的统一病因,也没有任何可靠的生物标志物可以表明个体患者的患病风险。由于无法提前预测 NEC,目前的医疗策略包括密切的临床监测,以便在不可挽回的肠道损伤发生之前尽快治疗患有 NEC 的婴儿。在本报告中,我们描述了一种基于肠道微生物组数据生成动态、个体化 NEC 风险评分的新机器学习应用,该评分可通过对否则丢弃的婴儿粪便中的细菌 DNA 进行测序来确定。我们的工作与以往的努力的一个区别是认识到,从粪便微生物组预测疾病代表了一种称为多实例学习(MIL)的机器学习问题的特定亚型。
我们使用了基于神经网络的 MIL 架构,在包含 261 名高危婴儿的 3595 个粪便样本的两个队列的独立数据集上对其进行了测试。我们的报告还引入了一个新的概念,称为“成长袋”分析,它随着时间的推移应用 MIL,允许将过去的数据纳入每个新的风险计算中。这种方法允许早期、准确地预测 NEC,平均敏感性为 86%,特异性为 90%。真正的阳性 NEC 预测在疾病发作前平均提前 8 天发生。我们还证明,我们的 MIL 算法中包含的注意力门控机制允许对 NEC 风险进行解释,确定了过去与 NEC 相关的几个细菌分类群,并可能为 NEC 发病机制提供新的假说。我们的系统具有灵活性,可以接受来自靶向 16S 或“ shotgun”全基因组 DNA 测序的微生物组数据。它在常见的、可能干扰早产新生儿的临床事件(如围产期心肺抑制、抗生素治疗、喂养中断或母乳喂养和配方奶之间的转换)中表现良好。
我们已经开发并验证了一种基于 MIL 的稳健系统,用于从无害收集的早产儿粪便中预测 NEC。虽然这个系统是为 NEC 预测而开发的,但我们的 MIL 方法也可能适用于其他以人类微生物组变化为特征的疾病。