Fei Teng, Funnell Tyler, Waters Nicholas R, Raj Sandeep S, Sadeghi Keimya, Dai Anqi, Miltiadous Oriana, Shouval Roni, Lv Meng, Peled Jonathan U, Ponce Doris M, Perales Miguel-Angel, Gönen Mithat, van den Brink Marcel R M
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center.
Department of Immunology, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center.
bioRxiv. 2023 Dec 18:2023.05.02.538599. doi: 10.1101/2023.05.02.538599.
Identifying predictive biomarkers of patient outcomes from high-throughput microbiome data is of high interest, while existing computational methods do not satisfactorily account for complex survival endpoints, longitudinal samples, and taxa-specific sequencing biases. We present FLORAL (https://vdblab.github.io/FLORAL/), an open-source computational tool to perform scalable log-ratio lasso regression and microbial feature selection for continuous, binary, time-to-event, and competing risk outcomes, with compatibility of longitudinal microbiome data as time-dependent covariates. The proposed method adapts the augmented Lagrangian algorithm for a zero-sum constraint optimization problem while enabling a two-stage screening process for extended false-positive control. In extensive simulation and real-data analyses, FLORAL achieved consistently better false-positive control compared to other lasso-based approaches, and better sensitivity over popular differential abundance testing methods for datasets with smaller sample size. In a survival analysis in allogeneic hematopoietic-cell transplant, we further demonstrated considerable improvement by FLORAL in microbial feature selection by utilizing longitudinal microbiome data over only using baseline microbiome data.
从高通量微生物组数据中识别患者预后的预测生物标志物备受关注,然而现有的计算方法不能令人满意地处理复杂的生存终点、纵向样本和特定分类群的测序偏差。我们提出了FLORAL(https://vdblab.github.io/FLORAL/),这是一种开源计算工具,用于对连续、二元、事件发生时间和竞争风险结局进行可扩展的对数比率套索回归和微生物特征选择,并将纵向微生物组数据作为时间依赖协变量进行兼容处理。所提出的方法针对零和约束优化问题采用增广拉格朗日算法,同时实现了用于扩展假阳性控制的两阶段筛选过程。在广泛的模拟和真实数据分析中,与其他基于套索的方法相比,FLORAL在假阳性控制方面始终表现更好,对于样本量较小的数据集,其灵敏度也优于流行的差异丰度测试方法。在异基因造血细胞移植的生存分析中,我们进一步证明,与仅使用基线微生物组数据相比,FLORAL通过利用纵向微生物组数据在微生物特征选择方面有显著改进。