医学中的机器学习：实用入门

Machine learning in medicine: a practical introduction.

机构信息

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK.

Department of Surgery, Harvard Medical School, 25 Shattuck Street, Boston, 01225, Massachusetts, USA.

出版信息

BMC Med Res Methodol. 2019 Mar 19;19(1):64. doi: 10.1186/s12874-019-0681-4.

DOI:10.1186/s12874-019-0681-4

PMID:30890124

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6425557/

Abstract

BACKGROUND

Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data.

METHODS

We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment.

RESULTS

The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble.

CONCLUSIONS

We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.

摘要

背景

机器学习技术在广泛的预测任务中取得了显著的成功，引起了医学研究人员和临床医生的浓厚兴趣。我们通过提供机器学习的概念介绍以及使用免费的开源软件和公共领域数据开发和评估预测算法的实用指南，来满足这一领域的能力发展需求。

方法

我们通过使用从乳腺肿块中采样的核描述来开发用于癌症诊断的三个预测模型，演示了机器学习技术的应用。这些算法包括正则化广义线性模型回归（GLMs）、具有径向基函数核的支持向量机（SVMs）和单层人工神经网络。描述乳腺肿块样本的公开可用数据集（N=683）随机分为评估（n=456）和验证（n=227）样本。我们在评估样本上训练算法，然后将其用于预测验证数据集的诊断结果。我们将验证数据集上的预测结果与实际诊断决策进行比较，以计算三个模型的准确性、敏感性和特异性。我们探索了使用平均值和投票集成来提高预测性能。我们提供了使用开源 R 统计编程环境开发算法的分步指南。

结果

训练后的算法能够以高精度（.94-.96）、敏感性（.97-.99）和特异性（.85-.94）对细胞核进行分类。使用 SVM 算法可实现最大精度（.96）和曲线下面积（.97）。当算法排列成投票集成时，预测性能略有提高（准确性 =.97，敏感性 =.99，特异性 =.95）。

结论

我们使用一个简单的例子向临床医生和医学研究人员演示了机器学习的理论和实践。我们在这里演示的原则可以很容易地应用于其他复杂任务，包括自然语言处理和图像识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61e1/6425557/2c3ee5902ac8/12874_2019_681_Fig1_HTML.jpg

相似文献

Machine learning in medicine: a practical introduction.医学中的机器学习：实用入门

BMC Med Res Methodol. 2019 Mar 19;19(1):64. doi: 10.1186/s12874-019-0681-4.

Reviewing ensemble classification methods in breast cancer.综述乳腺癌中的集成分类方法。

Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20.

Machine learning in medicine: a practical introduction to natural language processing.医学中的机器学习：自然语言处理实用入门。

BMC Med Res Methodol. 2021 Jul 31;21(1):158. doi: 10.1186/s12874-021-01347-1.

Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究

Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.

Diagnostic Accuracy of Different Machine Learning Algorithms for Breast Cancer Risk Calculation: a Meta-Analysis.不同机器学习算法用于乳腺癌风险计算的诊断准确性：一项荟萃分析

Asian Pac J Cancer Prev. 2018 Jul 27;19(7):1747-1752. doi: 10.22034/APJCP.2018.19.7.1747.

The BCPM method: decoding breast cancer with machine learning.BCPM 方法：用机器学习解码乳腺癌。

BMC Med Imaging. 2024 Sep 17;24(1):248. doi: 10.1186/s12880-024-01402-5.

Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。

Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.

MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.MABAL：一种用于机器辅助骨龄标注的新型深度学习架构。

J Digit Imaging. 2018 Aug;31(4):513-519. doi: 10.1007/s10278-018-0053-3.

Correlation-Based Ensemble Feature Selection Using Bioinspired Algorithms and Classification Using Backpropagation Neural Network.基于生物启发算法的相关性集成特征选择和反向传播神经网络分类。

Comput Math Methods Med. 2019 Sep 23;2019:7398307. doi: 10.1155/2019/7398307. eCollection 2019.

A comparative evaluation of the generalised predictive ability of eight machine learning algorithms across ten clinical metabolomics data sets for binary classification.八种机器学习算法在十个临床代谢组学数据集上进行二进制分类的广义预测能力的比较评估。

Metabolomics. 2019 Nov 15;15(12):150. doi: 10.1007/s11306-019-1612-4.

引用本文的文献

Prediction of double expression status of primary CNS lymphoma using multiparametric MRI radiomics combined with habitat radiomics: a double-center study.使用多参数MRI影像组学联合特征区域影像组学预测原发性中枢神经系统淋巴瘤的双表达状态：一项双中心研究

J Neurooncol. 2025 Sep 9. doi: 10.1007/s11060-025-05225-4.

Maternal Anemia as a Predictor of Anemia in the Child in Nepal: An Analysis of the Demographic and Health Survey.尼泊尔孕产妇贫血作为儿童贫血预测指标的研究：基于人口与健康调查的分析

Cureus. 2025 Aug 1;17(8):e89229. doi: 10.7759/cureus.89229. eCollection 2025 Aug.

A Deep Learning and Explainable Artificial Intelligence based Scheme for Breast Cancer Detection.一种基于深度学习和可解释人工智能的乳腺癌检测方案。

Sci Rep. 2025 Sep 1;15(1):32125. doi: 10.1038/s41598-024-80535-7.

Advanced MRI, Radiomics and Radiogenomics in Unravelling Incidental Glioma Grading and Genetic Status: Where Are We?高级磁共振成像、影像组学和放射基因组学在解读偶然发现的胶质瘤分级和基因状态中的应用：我们目前的进展如何？

Medicina (Kaunas). 2025 Aug 12;61(8):1453. doi: 10.3390/medicina61081453.

Machine learning-based prediction model for post-stroke cerebral-cardiac syndrome: a risk stratification study.基于机器学习的中风后脑心综合征预测模型：一项风险分层研究。

Sci Rep. 2025 Aug 20;15(1):30657. doi: 10.1038/s41598-025-10104-z.

Advances in artificial intelligence and precision nutrition approaches to improve maternal and child health in low resource settings.人工智能和精准营养方法在改善资源匮乏地区母婴健康方面的进展。

Nat Commun. 2025 Aug 18;16(1):7673. doi: 10.1038/s41467-025-62985-3.

AI and Machine Learning Terminology in Medicine, Psychology, and Social Sciences: Tutorial and Practical Recommendations.医学、心理学和社会科学中的人工智能与机器学习术语：教程与实用建议

J Med Internet Res. 2025 Aug 18;27:e66100. doi: 10.2196/66100.

Artificial intelligence-enhanced echocardiography in cardiovascular disease management.人工智能增强型超声心动图在心血管疾病管理中的应用

Nat Rev Cardiol. 2025 Aug 5. doi: 10.1038/s41569-025-01197-0.

Identification and validation of an explainable machine learning model for vascular depression diagnosis in the older adults: a multicenter cohort study.老年人血管性抑郁诊断的可解释机器学习模型的识别与验证：一项多中心队列研究

BMC Med. 2025 Jul 31;23(1):448. doi: 10.1186/s12916-025-04283-9.

Early Prediction of Mechanical Ventilation Needs in Very Preterm Neonates Using Machine Learning.使用机器学习对极早产儿机械通气需求进行早期预测

Pediatr Pulmonol. 2025 Jul;60(7):e71195. doi: 10.1002/ppul.71195.

本文引用的文献

Big Data and Machine Learning in Health Care.医疗保健中的大数据与机器学习

JAMA. 2018 Apr 3;319(13):1317-1318. doi: 10.1001/jama.2017.18391.

Supervised Machine Learning Algorithms Can Classify Open-Text Feedback of Doctor Performance With Human-Level Accuracy.监督式机器学习算法能够以人类水平的准确率对医生表现的开放式文本反馈进行分类。

J Med Internet Res. 2017 Mar 15;19(3):e65. doi: 10.2196/jmir.6533.

Diagnosis of prostate cancer by desorption electrospray ionization mass spectrometric imaging of small metabolites and lipids.通过小分子代谢物和脂质的解吸电喷雾电离质谱成像诊断前列腺癌。

Proc Natl Acad Sci U S A. 2017 Mar 28;114(13):3334-3339. doi: 10.1073/pnas.1700677114. Epub 2017 Mar 14.

Dermatologist-level classification of skin cancer with deep neural networks.基于深度神经网络的皮肤癌皮肤科医生级分类。

Nature. 2017 Feb 2;542(7639):115-118. doi: 10.1038/nature21056. Epub 2017 Jan 25.

Automated analysis of free speech predicts psychosis onset in high-risk youths.自动化的自由言论分析可预测高危青年的精神病发病。

NPJ Schizophr. 2015 Aug 26;1:15030. doi: 10.1038/npjschz.2015.30. eCollection 2015.

Machine Learning and the Profession of Medicine.机器学习与医学职业。

JAMA. 2016 Feb 9;315(6):551-2. doi: 10.1001/jama.2015.18421.

Reverse Engineering and Evaluation of Prediction Models for Progression to Type 2 Diabetes: An Application of Machine Learning Using Electronic Health Records.2型糖尿病进展预测模型的逆向工程与评估：基于电子健康记录的机器学习应用

J Diabetes Sci Technol. 2015 Dec 20;10(1):6-18. doi: 10.1177/1932296815620200.

Development and testing of a text-mining approach to analyse patients' comments on their experiences of colorectal cancer care.一种用于分析患者对结直肠癌护理体验评论的文本挖掘方法的开发与测试。

BMJ Qual Saf. 2016 Aug;25(8):604-14. doi: 10.1136/bmjqs-2015-004063. Epub 2015 Oct 28.

Measuring patient-perceived quality of care in US hospitals using Twitter.利用推特衡量美国医院患者感知的医疗质量。

BMJ Qual Saf. 2016 Jun;25(6):404-13. doi: 10.1136/bmjqs-2015-004309. Epub 2015 Oct 13.

Machine learning: Trends, perspectives, and prospects.机器学习：趋势、观点和展望。

Science. 2015 Jul 17;349(6245):255-60. doi: 10.1126/science.aaa8415.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

医学中的机器学习：实用入门

Machine learning in medicine: a practical introduction.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献