Wang Jie-Huei, Chen Yi-Hau
Department of Statistics, Feng Chia University, Seatwen, Taichung 40724, Taiwan.
Institute of Statistical Science, Academia Sinica, Nankang, Taipei 11529, Taiwan.
Bioinformatics. 2021 Aug 9;37(15):2150-2156. doi: 10.1093/bioinformatics/btab064.
In high-dimensional genetic/genomic data, the identification of genes related to clinical survival trait is a challenging and important issue. In particular, right-censored survival outcomes and contaminated biomarker data make the relevant feature screening difficult. Several independence screening methods have been developed, but they fail to account for gene-gene dependency information, and may be sensitive to outlying feature data.
We improve the inverse probability-of-censoring weighted (IPCW) Kendall's tau statistic by using Google's PageRank Markov matrix to incorporate feature dependency network information. Also, to tackle outlying feature data, the nonparanormal approach transforming the feature data to multivariate normal variates are utilized in the graphical lasso procedure to estimate the network structure in feature data. Simulation studies under various scenarios show that the proposed network-adjusted weighted Kendall's tau approach leads to more accurate feature selection and survival prediction than the methods without accounting for feature dependency network information and outlying feature data. The applications on the clinical survival outcome data of diffuse large B-cell lymphoma and of The Cancer Genome Atlas lung adenocarcinoma patients demonstrate clearly the advantages of the new proposal over the alternative methods.
Supplementary data are available at Bioinformatics online.
在高维遗传/基因组数据中,识别与临床生存特征相关的基因是一个具有挑战性且重要的问题。特别是,右删失生存结果和受污染的生物标志物数据使得相关特征筛选变得困难。已经开发了几种独立性筛选方法,但它们没有考虑基因-基因依赖信息,并且可能对异常特征数据敏感。
我们通过使用谷歌的PageRank马尔可夫矩阵纳入特征依赖网络信息,改进了删失逆概率加权(IPCW)肯德尔tau统计量。此外,为了处理异常特征数据,在图形拉索程序中利用将特征数据转换为多元正态变量的非正态方法来估计特征数据中的网络结构。各种场景下的模拟研究表明,与不考虑特征依赖网络信息和异常特征数据的方法相比,所提出的网络调整加权肯德尔tau方法能带来更准确的特征选择和生存预测。对弥漫性大B细胞淋巴瘤临床生存结果数据以及癌症基因组图谱肺腺癌患者数据的应用清楚地证明了新方法相对于其他方法的优势。
补充数据可在《生物信息学》在线获取。