Cruz Joseph A, Wishart David S
Departments of Biological Science and Computing Science, University of Alberta Edmonton, AB, Canada T6G 2E8.
Cancer Inform. 2007 Feb 11;2:59-77.
Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.
机器学习是人工智能的一个分支,它运用各种统计、概率和优化技术,使计算机能够从过去的实例中“学习”,并从大量、有噪声或复杂的数据集中检测难以辨别的模式。这种能力特别适用于医学应用,尤其是那些依赖复杂蛋白质组学和基因组学测量的应用。因此,机器学习经常用于癌症诊断和检测。最近,机器学习已应用于癌症预后和预测。后一种方法特别有趣,因为它是个性化、预测性医学这一不断发展趋势的一部分。在撰写本综述时,我们对正在使用的不同类型机器学习方法、所整合的数据类型以及这些方法在癌症预测和预后方面的表现进行了广泛调查。我们注意到了一些趋势,包括对蛋白质生物标志物和微阵列数据的依赖日益增加、对前列腺癌和乳腺癌应用的强烈偏向,以及对诸如人工神经网络(ANNs)等“较旧”技术的严重依赖,而不是更新近开发或更易于解释的机器学习方法。一些已发表的研究似乎也缺乏适当水平的验证或测试。在设计和验证较好的研究中,很明显机器学习方法可用于大幅(15 - 25%)提高预测癌症易感性、复发和死亡率的准确性。在更基本的层面上,同样明显的是,机器学习也有助于增进我们对癌症发展和进展的基本理解。