Suppr超能文献

CondiS:一种基于条件生存分布的有删失数据插补方法,克服了基于机器学习的生存分析中的障碍。

CondiS: A conditional survival distribution-based method for censored data imputation overcoming the hurdle in machine learning-based survival analysis.

机构信息

Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.

Department of Lymphoma/Myeloma, The University of Texas MD Anderson Cancer Center, Houston, TX USA.

出版信息

J Biomed Inform. 2022 Jul;131:104117. doi: 10.1016/j.jbi.2022.104117. Epub 2022 Jun 9.

Abstract

Data analyses by machine learning (ML) algorithms are gaining popularity in biomedical research. When time-to-event data are of interest, censoring is common and needs to be properly addressed. Most ML methods cannot conveniently and appropriately take the censoring information into consideration, potentially leading to inaccurate or biased results. We aim to develop a general-purpose method for imputing censored survival data, facilitating downstream ML analysis. In this study, we propose a novel method of imputing the survival times for censored observations. The proposal is based on their conditional survival distributions (CondiS) derived from Kaplan-Meier estimators. CondiS can replace censored observations with their best approximations from the statistical model, allowing for direct application of ML methods. When covariates are available, we extend CondiS by incorporating the covariate information through ML modeling (CondiS-X), which further improves the accuracy of the imputed survival time. Compared with existing methods with similar purposes, the proposed methods achieved smaller prediction errors and higher concordance with the underlying true survival times in extensive simulation studies. We also demonstrated the usage and advantages of the proposed methods through two real-world cancer datasets. The major advantage of CondiS is that it allows for the direct application of standard ML techniques for analysis once the censored survival times are imputed. We present a user-friendly R package to implement our method, which is a useful tool for ML-based biomedical research in this era of big data.

摘要

数据的机器学习(ML)分析方法在生物医学研究中越来越受欢迎。当关注的是生存时间数据时,删失很常见,需要正确处理。大多数 ML 方法不能方便地、适当地考虑删失信息,这可能导致不准确或有偏差的结果。我们旨在开发一种通用的方法来填补删失的生存数据,为下游的 ML 分析提供便利。在这项研究中,我们提出了一种填补删失观察生存时间的新方法。该方法基于从 Kaplan-Meier 估计器中得到的条件生存分布(CondiS)。CondiS 可以用统计模型中删失观察值的最佳近似值来替换删失观察值,从而可以直接应用 ML 方法。当有协变量时,我们通过 ML 建模(CondiS-X)来扩展 CondiS,将协变量信息纳入其中,进一步提高了所填补的生存时间的准确性。与具有相似目的的现有方法相比,在广泛的模拟研究中,所提出的方法实现了更小的预测误差和与潜在真实生存时间更高的一致性。我们还通过两个真实的癌症数据集展示了所提出方法的使用和优势。CondiS 的主要优势在于,一旦填补了删失的生存时间,它就可以允许直接应用标准的 ML 技术进行分析。我们提供了一个用户友好的 R 包来实现我们的方法,这是大数据时代基于 ML 的生物医学研究的有用工具。

相似文献

5
Impact of censoring on learning Bayesian networks in survival modelling.生存模型中删失数据对贝叶斯网络学习的影响。
Artif Intell Med. 2009 Nov;47(3):199-217. doi: 10.1016/j.artmed.2009.08.001. Epub 2009 Oct 14.
6
A two-sample test with interval censored data via multiple imputation.通过多重填补法对区间删失数据进行双样本检验。
Stat Med. 2000 Jan 15;19(1):1-11. doi: 10.1002/(sici)1097-0258(20000115)19:1<1::aid-sim296>3.0.co;2-q.
10
Survival trees for interval-censored survival data.区间删失生存数据的生存树
Stat Med. 2017 Dec 30;36(30):4831-4842. doi: 10.1002/sim.7450. Epub 2017 Aug 18.

引用本文的文献

3
Making Sense of Censored Covariates: Statistical Methods for Studies of Huntington's Disease.理解删失协变量:亨廷顿舞蹈症研究的统计方法
Annu Rev Stat Appl. 2024 Apr;11:255-277. doi: 10.1146/annurev-statistics-040522-095944. Epub 2023 Sep 8.

本文引用的文献

1
A survey on missing data in machine learning.关于机器学习中缺失数据的一项调查。
J Big Data. 2021;8(1):140. doi: 10.1186/s40537-021-00516-9. Epub 2021 Oct 27.
2
Predicting cancer outcomes from histology and genomics using convolutional networks.使用卷积网络从组织学和基因组学预测癌症结局。
Proc Natl Acad Sci U S A. 2018 Mar 27;115(13):E2970-E2979. doi: 10.1073/pnas.1717139115. Epub 2018 Mar 12.
6
Big data analytics in healthcare: promise and potential.医疗保健中的大数据分析:前景与潜力。
Health Inf Sci Syst. 2014 Feb 7;2:3. doi: 10.1186/2047-2501-2-3. eCollection 2014.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验