Am J Epidemiol. 2024 Jul 8;193(7):1010-1018. doi: 10.1093/aje/kwae006.
The statistical analysis of omics data poses a great computational challenge given their ultra-high-dimensional nature and frequent between-features correlation. In this work, we extended the iterative sure independence screening (ISIS) algorithm by pairing ISIS with elastic-net (Enet) and 2 versions of adaptive elastic-net (adaptive elastic-net (AEnet) and multistep adaptive elastic-net (MSAEnet)) to efficiently improve feature selection and effect estimation in omics research. We subsequently used genome-wide human blood DNA methylation data from American Indian participants in the Strong Heart Study (n = 2235 participants; measured in 1989-1991) to compare the performance (predictive accuracy, coefficient estimation, and computational efficiency) of ISIS-paired regularization methods with that of a bayesian shrinkage and traditional linear regression to identify an epigenomic multimarker of body mass index (BMI). ISIS-AEnet outperformed the other methods in prediction. In biological pathway enrichment analysis of genes annotated to BMI-related differentially methylated positions, ISIS-AEnet captured most of the enriched pathways in common for at least 2 of all the evaluated methods. ISIS-AEnet can favor biological discovery because it identifies the most robust biological pathways while achieving an optimal balance between bias and efficient feature selection. In the extended SIS R package, we also implemented ISIS paired with Cox and logistic regression for time-to-event and binary endpoints, respectively, and a bootstrap approach for the estimation of regression coefficients.
由于组学数据具有超高维特性和频繁的特征间相关性,因此对其进行统计分析是一项巨大的计算挑战。在这项工作中,我们通过将 ISIS 与弹性网络(Enet)和 2 种自适应弹性网络(adaptive elastic-net (AEnet) 和 multistep adaptive elastic-net (MSAEnet))配对,扩展了迭代独立筛选(ISIS)算法,以有效地提高组学研究中的特征选择和效果估计。随后,我们使用来自美国印第安人参与者的全基因组人类血液 DNA 甲基化数据(Strong Heart Study,n = 2235 名参与者;1989-1991 年测量),比较了 ISIS 配对正则化方法与贝叶斯收缩和传统线性回归的性能(预测准确性、系数估计和计算效率),以识别身体质量指数(BMI)的表观基因组多标记物。ISIS-AEnet 在预测方面优于其他方法。在注释为 BMI 相关差异甲基化位置的基因的生物学途径富集分析中,ISIS-AEnet 捕获了大多数至少有 2 种评估方法共同富集的途径。ISIS-AEnet 可以有利于生物学发现,因为它可以识别最稳健的生物学途径,同时在偏差和有效特征选择之间实现最佳平衡。在扩展的 SIS R 包中,我们还分别实现了 ISIS 与 Cox 和逻辑回归配对,用于时间事件和二项结局,以及用于回归系数估计的自举方法。