Chen Zhibo, Lu Zi-Tong, Song Xue-Ting, Gao Yu-Fan, Xiao Jian
School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan, Hubei, People's Republic of China.
PLoS One. 2025 Aug 22;20(8):e0300490. doi: 10.1371/journal.pone.0300490. eCollection 2025.
Omics-wide association analysis is a very important tool for medicine and human health study. However, the modern omics data sets collected often exhibit the high-dimensionality, unknown distribution response, unknown distribution features and unknown complex association relationships between the response and its explanatory features. Reliable association analysis results depend on an accurate modeling for such data sets. Most of the existing association analysis methods rely on the specific model assumptions and lack effective false discovery rate (FDR) control. To address these limitations, the paper firstly applies a single index model for omics data. The model shows robust performance in allowing the relationships between the response variable and linear combination of covariates to be connected by any unknown monotonic link function, and both the random error and the covariates can follow any unknown distribution. Then based on this model, the paper combines rank-based approach and symmetrized data aggregation approach to develop a novel and robust feature selection method for achieving fine-mapping of risk features while controlling the false positive rate of selection. The theoretical results support the proposed method and the analysis results of simulated data show the new method possesses effective and robust performance for all the scenarios. The new method is also used to analyze the two real datasets and identifies some risk features unreported by the existing finds.
全基因组关联分析是医学和人类健康研究中非常重要的工具。然而,收集到的现代组学数据集常常呈现出高维度、响应分布未知、特征分布未知以及响应与其解释性特征之间复杂关联关系未知的特点。可靠的关联分析结果依赖于对此类数据集进行准确建模。现有的大多数关联分析方法依赖于特定的模型假设,并且缺乏有效的错误发现率(FDR)控制。为了解决这些局限性,本文首先将单指标模型应用于组学数据。该模型表现出强大的性能,它允许响应变量与协变量的线性组合之间的关系通过任何未知的单调链接函数来连接,并且随机误差和协变量都可以遵循任何未知分布。然后基于此模型,本文结合基于秩的方法和对称数据聚合方法,开发了一种新颖且强大的特征选择方法,以在控制选择假阳性率的同时实现风险特征的精细定位。理论结果支持了所提出的方法,模拟数据的分析结果表明新方法在所有场景下都具有有效且稳健的性能。新方法还被用于分析两个真实数据集,并识别出一些现有研究未报道的风险特征。