Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, China.
School of Public Health and Health Management, Gannan Medical University, Ganzhou, China.
J Transl Med. 2024 Feb 20;22(1):185. doi: 10.1186/s12967-024-05005-0.
Clinical data mining of predictive models offers significant advantages for re-evaluating and leveraging large amounts of complex clinical real-world data and experimental comparison data for tasks such as risk stratification, diagnosis, classification, and survival prediction. However, its translational application is still limited. One challenge is that the proposed clinical requirements and data mining are not synchronized. Additionally, the exotic predictions of data mining are difficult to apply directly in local medical institutions. Hence, it is necessary to incisively review the translational application of clinical data mining, providing an analytical workflow for developing and validating prediction models to ensure the scientific validity of analytic workflows in response to clinical questions. This review systematically revisits the purpose, process, and principles of clinical data mining and discusses the key causes contributing to the detachment from practice and the misuse of model verification in developing predictive models for research. Based on this, we propose a niche-targeting framework of four principles: Clinical Contextual, Subgroup-Oriented, Confounder- and False Positive-Controlled (CSCF), to provide guidance for clinical data mining prior to the model's development in clinical settings. Eventually, it is hoped that this review can help guide future research and develop personalized predictive models to achieve the goal of discovering subgroups with varied remedial benefits or risks and ensuring that precision medicine can deliver its full potential.
临床数据挖掘预测模型为重新评估和利用大量复杂的临床真实世界数据和实验比较数据提供了显著优势,可用于风险分层、诊断、分类和生存预测等任务。然而,其转化应用仍然有限。一个挑战是提出的临床要求与数据挖掘不同步。此外,数据挖掘的奇异预测难以直接应用于当地医疗机构。因此,有必要对临床数据挖掘的转化应用进行深入审查,为开发和验证预测模型提供分析工作流程,以确保分析工作流程针对临床问题的科学有效性。本综述系统地回顾了临床数据挖掘的目的、过程和原则,并讨论了导致其与实践脱节和模型验证在研究中开发预测模型时被滥用的关键原因。在此基础上,我们提出了一个以四个原则为导向的针对性框架:临床语境、亚组导向、混杂因素和假阳性控制(CSCF),以在临床环境中开发模型之前为临床数据挖掘提供指导。最终,希望本综述能够为未来的研究提供指导,并开发个性化的预测模型,以实现发现具有不同治疗效果或风险的亚组的目标,并确保精准医学能够充分发挥其潜力。