Schut Martijn C, Luik Torec T, Vagliano Iacopo, Rios Miguel, Helsper Charles W, van Asselt Kristel M, de Wit Niek, Abu-Hanna Ameen, van Weert Henk Cpm
Department of Laboratory Medicine, Amsterdam University Medical Center (UMC) Vrije Universiteit Amsterdam, Amsterdam; Amsterdam Public Health, Amsterdam UMC, Amsterdam, the Netherlands.
Department of Medical Informatics, Amsterdam UMC Academic Medical Center (AMC), Amsterdam; Amsterdam Public Health, Amsterdam UMC, Amsterdam; Department of Medical Biology, Amsterdam UMC AMC, Amsterdam, the Netherlands.
Br J Gen Pract. 2025 May 2;75(754):e316-e322. doi: 10.3399/BJGP.2023.0489. Print 2025 May.
The journey of >80% of patients diagnosed with lung cancer starts in general practice. About 75% of patients are diagnosed when it is at an advanced stage (3 or 4), leading to >80% mortality within 1 year at present. The long-term data in GP records might contain hidden information that could be used for earlier case finding of patients with cancer.
To develop new prediction tools that improve the risk assessment for lung cancer.
Text analysis of electronic patient data using natural language processing and machine learning in the general practice files of four networks in the Netherlands.
Files of 525 526 patients were analysed, of whom 2386 were diagnosed with lung cancer. Diagnoses were validated by using the Dutch cancer registry, and both structured and free-text data were used to predict the diagnosis of lung cancer 5 months before diagnosis (4 months before referral).
The algorithm could facilitate earlier detection of lung cancer using routine general practice data. Discrimination, calibration, sensitivity, and specificity were established under various cut-off points of the prediction 5 months before diagnosis. Internal validation of the best model demonstrated an area under the curve of 0.88 (95% confidence interval [CI] = 0.86 to 0.89), which shrunk to 0.79 (95% CI = 0.78 to 0.80) during external validation. The desired sensitivity determines the number of patients to be referred to detect one patient with lung cancer.
Artificial intelligence-based support enables earlier detection of lung cancer in general practice using readily available text in the patient files of GPs, but needs additional prospective clinical evaluation.
超过80%被诊断为肺癌的患者病程始于全科医疗。约75%的患者在晚期(3期或4期)被诊断出来,目前导致1年内死亡率超过80%。全科医疗记录中的长期数据可能包含可用于早期发现癌症患者的隐藏信息。
开发新的预测工具以改善肺癌风险评估。
在荷兰四个网络的全科医疗档案中,使用自然语言处理和机器学习对电子患者数据进行文本分析。
分析了525526名患者的档案,其中2386人被诊断为肺癌。通过荷兰癌症登记处对诊断进行验证,并使用结构化和自由文本数据来预测诊断前5个月(转诊前4个月)的肺癌情况。
该算法可利用常规全科医疗数据促进肺癌的早期检测。在诊断前5个月的预测不同临界点下确定了区分度、校准度、敏感性和特异性。最佳模型的内部验证显示曲线下面积为0.88(95%置信区间[CI]=0.86至0.89),外部验证期间缩小至0.79(95%CI=0.78至0.80)。所需的敏感性决定了为检测出一名肺癌患者而需转诊的患者数量。
基于人工智能的支持能够利用全科医生患者档案中现成的文本在全科医疗中早期检测肺癌,但需要额外的前瞻性临床评估。