Suppr超能文献

-KIDS:超高维右删失数据中的一种新型特征评估方法及其在头颈癌中的应用

-KIDS: A novel feature evaluation in the ultrahigh-dimensional right-censored setting, with application to Head and Neck Cancer.

作者信息

Urmi Atika FArzana, Ke Chenlu, Bandyopadhyay Dipankar

机构信息

Department of Biostatistics Virginia Commonwealth University VA, USA.

Department of Statistical Sciences and Operations Research Virginia Commonwealth University VA, USA.

出版信息

medRxiv. 2024 Aug 14:2024.08.13.24311946. doi: 10.1101/2024.08.13.24311946.

Abstract

Recent advances in sequencing technologies have allowed collection of massive genome-wide information that substantially enhances the diagnosis and prognosis of head and neck cancer. Identifying predictive markers for survival time is crucial for devising prognostic systems, and learning the underlying molecular driver of the cancer course. In this paper, we introduce -KIDS, a model-free feature screening procedure with false discovery rate (FDR) control for ultrahigh dimensional right-censored data, which is robust against unknown censoring mechanisms. Specifically, our two-stage procedure initially selects a set of important features with a dual screening mechanism using nonparametric reproducing-kernel-based ANOVA statistics, followed by identifying a refined set (of features) under directional FDR control through a unified knockoff procedure. The finite sample properties of our method, and its novelty (in light of existing alternatives) are evaluated via simulation studies. Furthermore, we illustrate our methodology via application to a motivating right-censored head and neck (HN) cancer survival data derived from The Cancer Genome Atlas, with further validation on a similar HN cancer data from the Gene Expression Omnibus database. The methodology can be implemented via the R package DSFDRC, available in GitHub.

摘要

测序技术的最新进展使得能够收集大量全基因组信息,这极大地改善了头颈癌的诊断和预后。识别生存时间的预测标志物对于设计预后系统以及了解癌症进程的潜在分子驱动因素至关重要。在本文中,我们介绍了-KIDS,这是一种针对超高维右删失数据的、具有错误发现率(FDR)控制的无模型特征筛选程序,它对未知删失机制具有鲁棒性。具体而言,我们的两阶段程序首先使用基于非参数再生核的方差分析统计量通过双重筛选机制选择一组重要特征,然后通过统一的仿冒程序在定向FDR控制下识别一组精炼的(特征)。我们通过模拟研究评估了我们方法的有限样本性质及其新颖性(相对于现有替代方法)。此外,我们通过将其应用于来自癌症基因组图谱的一个具有启发性的右删失头颈(HN)癌生存数据来说明我们的方法,并在来自基因表达综合数据库的类似HN癌数据上进行了进一步验证。该方法可以通过GitHub上提供的R包DSFDRC来实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9418/11343256/d0d0325b9a46/nihpp-2024.08.13.24311946v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验