Suppr超能文献

医学中的机器学习:实用入门

Machine learning in medicine: a practical introduction.

机构信息

Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, CB2 1PZ, UK.

Department of Surgery, Harvard Medical School, 25 Shattuck Street, Boston, 01225, Massachusetts, USA.

出版信息

BMC Med Res Methodol. 2019 Mar 19;19(1):64. doi: 10.1186/s12874-019-0681-4.

Abstract

BACKGROUND

Following visible successes on a wide range of predictive tasks, machine learning techniques are attracting substantial interest from medical researchers and clinicians. We address the need for capacity development in this area by providing a conceptual introduction to machine learning alongside a practical guide to developing and evaluating predictive algorithms using freely-available open source software and public domain data.

METHODS

We demonstrate the use of machine learning techniques by developing three predictive models for cancer diagnosis using descriptions of nuclei sampled from breast masses. These algorithms include regularized General Linear Model regression (GLMs), Support Vector Machines (SVMs) with a radial basis function kernel, and single-layer Artificial Neural Networks. The publicly-available dataset describing the breast mass samples (N=683) was randomly split into evaluation (n=456) and validation (n=227) samples. We trained algorithms on data from the evaluation sample before they were used to predict the diagnostic outcome in the validation dataset. We compared the predictions made on the validation datasets with the real-world diagnostic decisions to calculate the accuracy, sensitivity, and specificity of the three models. We explored the use of averaging and voting ensembles to improve predictive performance. We provide a step-by-step guide to developing algorithms using the open-source R statistical programming environment.

RESULTS

The trained algorithms were able to classify cell nuclei with high accuracy (.94 -.96), sensitivity (.97 -.99), and specificity (.85 -.94). Maximum accuracy (.96) and area under the curve (.97) was achieved using the SVM algorithm. Prediction performance increased marginally (accuracy =.97, sensitivity =.99, specificity =.95) when algorithms were arranged into a voting ensemble.

CONCLUSIONS

We use a straightforward example to demonstrate the theory and practice of machine learning for clinicians and medical researchers. The principals which we demonstrate here can be readily applied to other complex tasks including natural language processing and image recognition.

摘要

背景

机器学习技术在广泛的预测任务中取得了显著的成功,引起了医学研究人员和临床医生的浓厚兴趣。我们通过提供机器学习的概念介绍以及使用免费的开源软件和公共领域数据开发和评估预测算法的实用指南,来满足这一领域的能力发展需求。

方法

我们通过使用从乳腺肿块中采样的核描述来开发用于癌症诊断的三个预测模型,演示了机器学习技术的应用。这些算法包括正则化广义线性模型回归(GLMs)、具有径向基函数核的支持向量机(SVMs)和单层人工神经网络。描述乳腺肿块样本的公开可用数据集(N=683)随机分为评估(n=456)和验证(n=227)样本。我们在评估样本上训练算法,然后将其用于预测验证数据集的诊断结果。我们将验证数据集上的预测结果与实际诊断决策进行比较,以计算三个模型的准确性、敏感性和特异性。我们探索了使用平均值和投票集成来提高预测性能。我们提供了使用开源 R 统计编程环境开发算法的分步指南。

结果

训练后的算法能够以高精度(.94-.96)、敏感性(.97-.99)和特异性(.85-.94)对细胞核进行分类。使用 SVM 算法可实现最大精度(.96)和曲线下面积(.97)。当算法排列成投票集成时,预测性能略有提高(准确性 =.97,敏感性 =.99,特异性 =.95)。

结论

我们使用一个简单的例子向临床医生和医学研究人员演示了机器学习的理论和实践。我们在这里演示的原则可以很容易地应用于其他复杂任务,包括自然语言处理和图像识别。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/61e1/6425557/2c3ee5902ac8/12874_2019_681_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验