Golob Jonathan L, Oskotsky Tomiko T, Tang Alice S, Roldan Alennie, Chung Verena, Ha Connie W Y, Wong Ronald J, Flynn Kaitlin J, Parraga-Leo Antonio, Wibrand Camilla, Minot Samuel S, Andreoletti Gaia, Kosti Idit, Bletz Julie, Nelson Amber, Gao Jifan, Wei Zhoujingpeng, Chen Guanhua, Tang Zheng-Zheng, Novielli Pierfrancesco, Romano Donato, Pantaleo Ester, Amoroso Nicola, Monaco Alfonso, Vacca Mirco, De Angelis Maria, Bellotti Roberto, Tangaro Sabina, Kuntzleman Abigail, Bigcraft Isaac, Techtmann Stephen, Bae Daehun, Kim Eunyoung, Jeon Jongbum, Joe Soobok, Theis Kevin R, Ng Sherrianne, Lee Li Yun S, Diaz-Gimeno Patricia, Bennett Phillip R, MacIntyre David A, Stolovitzky Gustavo, Lynch Susan V, Albrecht Jake, Gomez-Lopez Nardhy, Romero Roberto, Stevenson David K, Aghaeepour Nima, Tarca Adi L, Costello James C, Sirota Marina
Division of Infectious Disease. Department of Internal Medicine. University of Michigan. Ann Arbor, MI. USA.
March of Dimes Prematurity Research Center at the University of California San Francisco, San Francisco, CA USA.
medRxiv. 2023 Apr 11:2023.03.07.23286920. doi: 10.1101/2023.03.07.23286920.
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.
在全球范围内,每年约有11%的婴儿早产,即妊娠37周前出生,这会带来严重且长期的健康后果。多项研究已将阴道微生物群与早产联系起来。我们提出了一种众包方法来预测:(a) 早产或 (b) 极早早产,数据来自9项公开的阴道微生物群研究,涵盖1268名孕妇的3578个样本,这些样本通过开源工具MaLiAmPi从原始序列中汇总而来。我们在代表148名孕妇的331个样本的新数据集上验证了众包模型。从318名DREAM挑战赛参与者那里,我们收到了针对两个单独预测子挑战赛的148份和121份提交结果,排名靠前的提交结果的自展AUROC分数分别达到0.69和0.87。在表现最佳的模型中,α多样性、巴伦西亚群落状态类型和组成(通过系统发育型相对丰度)是重要特征,其中大多数是基于树的方法。这项工作为后续将预测性测试转化为临床实践以及更好地理解和预防早产的努力奠定了基础。