• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

SurvdigitizeR:一种用于自动生存曲线数字化的算法。

SurvdigitizeR: an algorithm for automated survival curve digitization.

机构信息

Child Health Evaluative Sciences, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children, Toronto, ON, Canada.

Biostatistics Division, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada.

出版信息

BMC Med Res Methodol. 2024 Jul 13;24(1):147. doi: 10.1186/s12874-024-02273-8.

DOI:10.1186/s12874-024-02273-8
PMID:39003440
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11245803/
Abstract

BACKGROUND

Decision analytic models and meta-analyses often rely on survival probabilities that are digitized from published Kaplan-Meier (KM) curves. However, manually extracting these probabilities from KM curves is time-consuming, expensive, and error-prone. We developed an efficient and accurate algorithm that automates extraction of survival probabilities from KM curves.

METHODS

The automated digitization algorithm processes images from a JPG or PNG format, converts them in their hue, saturation, and lightness scale and uses optical character recognition to detect axis location and labels. It also uses a k-medoids clustering algorithm to separate multiple overlapping curves on the same figure. To validate performance, we generated survival plots form random time-to-event data from a sample size of 25, 50, 150, and 250, 1000 individuals split into 1,2, or 3 treatment arms. We assumed an exponential distribution and applied random censoring. We compared automated digitization and manual digitization performed by well-trained researchers. We calculated the root mean squared error (RMSE) at 100-time points for both methods. The algorithm's performance was also evaluated by Bland-Altman analysis for the agreement between automated and manual digitization on a real-world set of published KM curves.

RESULTS

The automated digitizer accurately identified survival probabilities over time in the simulated KM curves. The average RMSE for automated digitization was 0.012, while manual digitization had an average RMSE of 0.014. Its performance was negatively correlated with the number of curves in a figure and the presence of censoring markers. In real-world scenarios, automated digitization and manual digitization showed very close agreement.

CONCLUSIONS

The algorithm streamlines the digitization process and requires minimal user input. It effectively digitized KM curves in simulated and real-world scenarios, demonstrating accuracy comparable to conventional manual digitization. The algorithm has been developed as an open-source R package and as a Shiny application and is available on GitHub: https://github.com/Pechli-Lab/SurvdigitizeR and https://pechlilab.shinyapps.io/SurvdigitizeR/ .

摘要

背景

决策分析模型和荟萃分析通常依赖于从已发表的 Kaplan-Meier(KM)曲线数字化的生存概率。然而,从 KM 曲线手动提取这些概率既耗时、昂贵又容易出错。我们开发了一种高效准确的算法,可自动从 KM 曲线中提取生存概率。

方法

自动化数字化算法处理来自 JPG 或 PNG 格式的图像,将其转换为色调、饱和度和亮度尺度,并使用光学字符识别来检测轴的位置和标签。它还使用 k-中心点聚类算法来分离同一图形上的多个重叠曲线。为了验证性能,我们从样本量为 25、50、150 和 250、1000 的随机事件时间数据生成生存图,这些个体分为 1、2 或 3 个治疗组。我们假设了指数分布并应用了随机删失。我们比较了由训练有素的研究人员进行的自动数字化和手动数字化。我们计算了两种方法在 100 个时间点的均方根误差(RMSE)。还通过 Bland-Altman 分析评估了算法在真实出版的 KM 曲线上自动和手动数字化之间的一致性。

结果

自动数字化器在模拟的 KM 曲线中准确地识别了随时间变化的生存概率。自动数字化的平均 RMSE 为 0.012,而手动数字化的平均 RMSE 为 0.014。其性能与图形中的曲线数量和存在的删失标记呈负相关。在真实场景中,自动数字化和手动数字化显示出非常接近的一致性。

结论

该算法简化了数字化过程,仅需最少的用户输入。它在模拟和真实场景中有效地数字化了 KM 曲线,显示出与传统手动数字化相当的准确性。该算法已作为开源 R 包和 Shiny 应用程序开发,并可在 GitHub 上获得:https://github.com/Pechli-Lab/SurvdigitizeR 和 https://pechlilab.shinyapps.io/SurvdigitizeR/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/79137773d412/12874_2024_2273_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/516e6aad92d9/12874_2024_2273_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/42f783216102/12874_2024_2273_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/d8f79b84175f/12874_2024_2273_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/6131709a449b/12874_2024_2273_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/c55be8fdbe46/12874_2024_2273_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/e5b6b45a1369/12874_2024_2273_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/860632997891/12874_2024_2273_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/aef8cf0dcea6/12874_2024_2273_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/0237f3b36f4b/12874_2024_2273_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/3fdba69c05e9/12874_2024_2273_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/79137773d412/12874_2024_2273_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/516e6aad92d9/12874_2024_2273_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/42f783216102/12874_2024_2273_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/d8f79b84175f/12874_2024_2273_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/6131709a449b/12874_2024_2273_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/c55be8fdbe46/12874_2024_2273_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/e5b6b45a1369/12874_2024_2273_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/860632997891/12874_2024_2273_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/aef8cf0dcea6/12874_2024_2273_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/0237f3b36f4b/12874_2024_2273_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/3fdba69c05e9/12874_2024_2273_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fd05/11245803/79137773d412/12874_2024_2273_Fig11_HTML.jpg

相似文献

1
SurvdigitizeR: an algorithm for automated survival curve digitization.SurvdigitizeR:一种用于自动生存曲线数字化的算法。
BMC Med Res Methodol. 2024 Jul 13;24(1):147. doi: 10.1186/s12874-024-02273-8.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
IPDfromKM: reconstruct individual patient data from published Kaplan-Meier survival curves.KM-IPD: 从已发表的 Kaplan-Meier 生存曲线中重建个体患者数据。
BMC Med Res Methodol. 2021 Jun 1;21(1):111. doi: 10.1186/s12874-021-01308-8.
4
Automated applicator digitization for high-dose-rate cervix brachytherapy using image thresholding and density-based clustering.使用图像阈值处理和基于密度的聚类实现高剂量率宫颈近距离放射治疗的自动施源器数字化
Brachytherapy. 2020 Jan-Feb;19(1):111-118. doi: 10.1016/j.brachy.2019.09.002. Epub 2019 Oct 5.
5
How Does the Skeletal Oncology Research Group Algorithm's Prediction of 5-year Survival in Patients with Chondrosarcoma Perform on International Validation?骨肿瘤研究组算法对软骨肉瘤患者 5 年生存率的预测在国际验证中的表现如何?
Clin Orthop Relat Res. 2020 Oct;478(10):2300-2308. doi: 10.1097/CORR.0000000000001305.
6
Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves.增强型生存数据分析:从已发表的 Kaplan-Meier 生存曲线中重建数据。
BMC Med Res Methodol. 2012 Feb 1;12:9. doi: 10.1186/1471-2288-12-9.
7
Image digitization of discontinuous and degraded electrocardiogram paper records using an entropy-based bit plane slicing algorithm.使用基于熵的位平面切片算法对不连续且退化的心电图纸质记录进行图像数字化处理。
J Electrocardiol. 2018 Jul-Aug;51(4):707-713. doi: 10.1016/j.jelectrocard.2018.05.003. Epub 2018 May 25.
8
Development and evaluation of an automatic interstitial catheter digitization tool for adaptive high-dose-rate brachytherapy.用于自适应高剂量率近距离放射治疗的自动组织间插植导管数字化工具的开发与评估
Brachytherapy. 2015 Sep-Oct;14(5):619-25. doi: 10.1016/j.brachy.2015.05.004. Epub 2015 Jun 12.
9
A Method for Reconstructing Individual Patient Data From Kaplan-Meier Survival Curves That Incorporate Marked Censoring Times.一种从包含显著删失时间的Kaplan-Meier生存曲线重建个体患者数据的方法。
MDM Policy Pract. 2022 Jan 31;7(1):23814683221077643. doi: 10.1177/23814683221077643. eCollection 2022 Jan-Jun.
10
A system to use electromagnetic tracking for the quality assurance of brachytherapy catheter digitization.一种利用电磁跟踪进行近距离放射治疗导管数字化质量保证的系统。
Med Phys. 2014 Oct;41(10):101702. doi: 10.1118/1.4894710.

引用本文的文献

1
Reconstructing patient level survival data from published Kaplan-Meier curves.从已发表的Kaplan-Meier曲线重建患者层面的生存数据。
Contemp Clin Trials Commun. 2025 Aug 20;47:101542. doi: 10.1016/j.conctc.2025.101542. eCollection 2025 Oct.

本文引用的文献

1
Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction.用于疾病预测的K近邻(KNN)算法及其不同变体的性能比较分析。
Sci Rep. 2022 Apr 15;12(1):6256. doi: 10.1038/s41598-022-10358-x.
2
IPDfromKM: reconstruct individual patient data from published Kaplan-Meier survival curves.KM-IPD: 从已发表的 Kaplan-Meier 生存曲线中重建个体患者数据。
BMC Med Res Methodol. 2021 Jun 1;21(1):111. doi: 10.1186/s12874-021-01308-8.
3
Extrapolating Parametric Survival Models in Health Technology Assessment: A Simulation Study.
在健康技术评估中推断参数生存模型:一项模拟研究。
Med Decis Making. 2021 Jan;41(1):37-50. doi: 10.1177/0272989X20973201. Epub 2020 Dec 7.
4
Understanding Bland Altman analysis.理解布兰德-奥特曼分析。
Biochem Med (Zagreb). 2015 Jun 5;25(2):141-51. doi: 10.11613/BM.2015.015. eCollection 2015.
5
Recovering the raw data behind a non-parametric survival curve.恢复非参数生存曲线背后的原始数据。
Syst Rev. 2014 Dec 30;3:151. doi: 10.1186/2046-4053-3-151.
6
Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves.增强型生存数据分析:从已发表的 Kaplan-Meier 生存曲线中重建数据。
BMC Med Res Methodol. 2012 Feb 1;12:9. doi: 10.1186/1471-2288-12-9.
7
Modeling cumulative incidence function for competing risks data.对竞争风险数据的累积发病率函数进行建模。
Expert Rev Clin Pharmacol. 2008 May 1;1(3):391-400. doi: 10.1586/17512433.1.3.391.