Binghamton University, Department of Mathematics and Statistics, Binghamton, NY, 13902, USA.
Sci Rep. 2024 Aug 3;14(1):17996. doi: 10.1038/s41598-023-45813-w.
Detection of important genes affecting lung adenocarcinoma (LUAD) is critical to finding effective therapeutic targets for this highly lethal cancer. However, many existing approaches have focused on single outcomes or phenotypic associations, which may not be as thorough as investigating molecular transcript levels within cells. In this article, we apply a novel multivariate rank-distance correlation-based gene selection procedure (MrDcGene) to LUAD multi-omics data downloaded from The Cancer Genome Atlas (TCGA). MrDcGene provides additional opportunities for detecting novel susceptibility genes as it leverages information from multiple platforms, while efficiently handling challenges such as high dimensionality, low signal-to-noise ratio, unknown distributions, and non-linear structures, etc. Notably, the MrDcGene method is able to detect two different scenarios, i.e., strong association strength with a few gene expressions and weak association strength with several gene expressions. After thoroughly exploring the association between gene expression (GE) and multiple other platforms, including reverse phase protein array (RPPA), miRNA, copy number variation (CNV) and DNA methylation (ME), we detect several novel genes that may play an important role in LUAD (ZNF133, CCDC159, YWHAZ, HNRNPR. ITPR2, PTHLH, and WIPI2). In addition, we quantitatively validate several other susceptibility genes that were reported in the literature using different methods and studies. The accuracy of the MrDcGene approach is theoretically assured and empirically demonstrated by the simulation studies.
检测影响肺腺癌 (LUAD) 的重要基因对于寻找这种高度致命癌症的有效治疗靶点至关重要。然而,许多现有的方法都集中在单一的结果或表型关联上,这可能不如研究细胞内的分子转录水平全面。在本文中,我们应用了一种新的基于多元秩距离相关的基因选择程序 (MrDcGene) 来分析从癌症基因组图谱 (TCGA) 下载的 LUAD 多组学数据。MrDcGene 通过利用来自多个平台的信息,为检测新的易感性基因提供了额外的机会,同时有效地处理了高维性、低信噪比、未知分布和非线性结构等挑战。值得注意的是,MrDcGene 方法能够检测到两种不同的情况,即与少数基因表达具有强关联强度和与几个基因表达具有弱关联强度。在深入探讨基因表达 (GE) 与包括反向蛋白质阵列 (RPPA)、miRNA、拷贝数变异 (CNV) 和 DNA 甲基化 (ME) 在内的多个其他平台之间的关联之后,我们检测到了几个可能在 LUAD 中发挥重要作用的新基因(ZNF133、CCDC159、YWHAZ、HNRNPR、ITPR2、PTHLH 和 WIPI2)。此外,我们使用不同的方法和研究对文献中报道的其他几个易感性基因进行了定量验证。MrDcGene 方法的准确性在理论上和通过模拟研究得到了保证。