Li Jingmao, Zhang Qingzhao, Ma Shuangge, Fang Kuangnan, Xu Yaqing
Department of Statistics and Data Science, School of Economics, Xiamen University, Fujian, China.
The Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, China.
Stat Med. 2025 Feb 10;44(3-4):e10330. doi: 10.1002/sim.10330.
In biomedical studies, gene-environment (G-E) interactions have been demonstrated to have important implications for analyzing disease outcomes beyond the main G and main E effects. Many approaches have been developed for G-E interaction analysis, yielding important findings. However, hierarchical multi-label classification, which provides insightful information on disease outcomes, remains unexplored in G-E analysis literature. Moreover, unlabeled data are commonly observed in practical settings but omitted by many existing methods of hierarchical multi-label classification. In this study, we consider a semi-supervised scenario and develop a novel approach for the two-layer hierarchical response with G-E interactions. A two-step penalized estimation is then proposed using an efficient expectation-maximization (EM) algorithm. Simulation shows that it has superior performance in classification and feature selection. The analysis of The Cancer Genome Atlas (TCGA) data on lung cancer demonstrates the practical utility of the proposed method. Overall, this study can fill the important knowledge gap in G-E interaction analysis by providing a widely applicable framework for hierarchical multi-label classification of complex disease outcomes.
在生物医学研究中,基因-环境(G-E)相互作用已被证明对于分析超出主要基因(G)和主要环境(E)效应的疾病结局具有重要意义。已经开发了许多用于G-E相互作用分析的方法,并取得了重要发现。然而,分层多标签分类能够提供有关疾病结局的深刻见解,但在G-E分析文献中尚未得到探索。此外,在实际情况中经常会观察到未标记的数据,但许多现有的分层多标签分类方法都忽略了这些数据。在本研究中,我们考虑一种半监督场景,并开发了一种用于具有G-E相互作用的两层分层响应的新方法。然后使用高效的期望最大化(EM)算法提出了一种两步惩罚估计方法。模拟结果表明,该方法在分类和特征选择方面具有优越的性能。对癌症基因组图谱(TCGA)肺癌数据的分析证明了所提出方法的实际效用。总体而言,本研究通过为复杂疾病结局的分层多标签分类提供一个广泛适用的框架,可以填补G-E相互作用分析中的重要知识空白。