Academy of Statistics and Interdisciplinary Sciences, 12655East China Normal University, China.
School of Physical and Mathematical Sciences, 54761Nanyang Technological University, Singapore.
Stat Methods Med Res. 2021 Nov;30(11):2428-2446. doi: 10.1177/09622802211037071. Epub 2021 Sep 14.
Ultrahigh-dimensional gene features are often collected in modern cancer studies in which the number of gene features is extremely larger than sample size . While gene expression patterns have been shown to be related to patients' survival in microarray-based gene expression studies, one has to deal with the challenges of ultrahigh-dimensional genetic predictors for survival predicting and genetic understanding of the disease in precision medicine. The problem becomes more complicated when two types of survival endpoints, distant metastasis-free survival and overall survival, are of interest in the study and outcome data can be subject to semi-competing risks due to the fact that distant metastasis-free survival is possibly censored by overall survival but not vice versa. Our focus in this paper is to extract important features, which have great impacts on both distant metastasis-free survival and overall survival jointly, from massive gene expression data in the semi-competing risks setting. We propose a model-free screening method based on the ranking of the correlation between gene features and the joint survival function of two endpoints. The method accounts for the relationship between two endpoints in a simply defined utility measure that is easy to understand and calculate. We show its favorable theoretical properties such as the sure screening and ranking consistency, and evaluate its finite sample performance through extensive simulation studies. Finally, an application to classifying breast cancer data clearly demonstrates the utility of the proposed method in practice.
超高维基因特征在现代癌症研究中经常被收集,其中基因特征的数量远远超过样本量。虽然基因表达模式已被证明与基于微阵列的基因表达研究中患者的生存有关,但在精准医学中,人们必须应对超高维遗传预测因子对生存预测和疾病遗传理解的挑战。当研究中同时关注两种生存终点(无远处转移生存和总生存),并且由于无远处转移生存可能因总生存而截尾但反之不然,因此结局数据可能存在半竞争风险时,问题会变得更加复杂。我们的重点是从半竞争风险环境下的大量基因表达数据中提取对无远处转移生存和总生存都有重大影响的重要特征。我们提出了一种基于基因特征与两个终点联合生存函数之间相关性排序的无模型筛选方法。该方法在一个简单定义的效用度量中考虑了两个终点之间的关系,该度量易于理解和计算。我们展示了其有利的理论性质,如确定的筛选和排序一致性,并通过广泛的模拟研究评估了其有限样本性能。最后,一项乳腺癌数据分类的应用清楚地证明了该方法在实践中的实用性。