Suppr超能文献

用于特征筛选的网络调整肯德尔tau度量及其在高维生存基因组数据中的应用

Network-adjusted Kendall's Tau Measure for Feature Screening with Application to High-dimensional Survival Genomic Data.

作者信息

Wang Jie-Huei, Chen Yi-Hau

机构信息

Department of Statistics, Feng Chia University, Seatwen, Taichung 40724, Taiwan.

Institute of Statistical Science, Academia Sinica, Nankang, Taipei 11529, Taiwan.

出版信息

Bioinformatics. 2021 Aug 9;37(15):2150-2156. doi: 10.1093/bioinformatics/btab064.

Abstract

MOTIVATION

In high-dimensional genetic/genomic data, the identification of genes related to clinical survival trait is a challenging and important issue. In particular, right-censored survival outcomes and contaminated biomarker data make the relevant feature screening difficult. Several independence screening methods have been developed, but they fail to account for gene-gene dependency information, and may be sensitive to outlying feature data.

RESULTS

We improve the inverse probability-of-censoring weighted (IPCW) Kendall's tau statistic by using Google's PageRank Markov matrix to incorporate feature dependency network information. Also, to tackle outlying feature data, the nonparanormal approach transforming the feature data to multivariate normal variates are utilized in the graphical lasso procedure to estimate the network structure in feature data. Simulation studies under various scenarios show that the proposed network-adjusted weighted Kendall's tau approach leads to more accurate feature selection and survival prediction than the methods without accounting for feature dependency network information and outlying feature data. The applications on the clinical survival outcome data of diffuse large B-cell lymphoma and of The Cancer Genome Atlas lung adenocarcinoma patients demonstrate clearly the advantages of the new proposal over the alternative methods.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

在高维遗传/基因组数据中,识别与临床生存特征相关的基因是一个具有挑战性且重要的问题。特别是,右删失生存结果和受污染的生物标志物数据使得相关特征筛选变得困难。已经开发了几种独立性筛选方法,但它们没有考虑基因-基因依赖信息,并且可能对异常特征数据敏感。

结果

我们通过使用谷歌的PageRank马尔可夫矩阵纳入特征依赖网络信息,改进了删失逆概率加权(IPCW)肯德尔tau统计量。此外,为了处理异常特征数据,在图形拉索程序中利用将特征数据转换为多元正态变量的非正态方法来估计特征数据中的网络结构。各种场景下的模拟研究表明,与不考虑特征依赖网络信息和异常特征数据的方法相比,所提出的网络调整加权肯德尔tau方法能带来更准确的特征选择和生存预测。对弥漫性大B细胞淋巴瘤临床生存结果数据以及癌症基因组图谱肺腺癌患者数据的应用清楚地证明了新方法相对于其他方法的优势。

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验