Jovel Juan, Greiner Russell
The Metabolomics Innovation Centre, University of Alberta, Edmonton, AB, Canada.
Faculty of Science-Computing Science, University of Alberta, Edmonton, AB, Canada.
Front Med (Lausanne). 2021 Dec 16;8:771607. doi: 10.3389/fmed.2021.771607. eCollection 2021.
Machine learning (ML) approaches are a collection of algorithms that attempt to extract patterns from data and to associate such patterns with discrete classes of samples in the data-e.g., given a series of features describing persons, a ML model predicts whether a person is diseased or healthy, or given features of animals, it predicts weather an animal is treated or control, or whether molecules have the potential to interact or not, etc. ML approaches can also find such patterns in an agnostic manner, i.e., without having information about the classes. Respectively, those methods are referred to as supervised and unsupervised ML. A third type of ML is reinforcement learning, which attempts to find a sequence of actions that contribute to achieving a specific goal. All of these methods are becoming increasingly popular in biomedical research in quite diverse areas including drug design, stratification of patients, medical images analysis, molecular interactions, prediction of therapy outcomes and many more. We describe several supervised and unsupervised ML techniques, and illustrate a series of prototypical examples using state-of-the-art computational approaches. Given the complexity of reinforcement learning, it is not discussed in detail here, instead, interested readers are referred to excellent reviews on that topic. We focus on concepts rather than procedures, as our goal is to attract the attention of researchers in biomedicine toward the plethora of powerful ML methods and their potential to leverage basic and applied research programs.
机器学习(ML)方法是一组算法,旨在从数据中提取模式,并将这些模式与数据中样本的离散类别相关联——例如,给定一系列描述人的特征,一个ML模型预测一个人是否患病或健康;或者给定动物的特征,它预测一只动物是处于治疗组还是对照组;或者分子是否具有相互作用的潜力等等。ML方法也可以以一种无先验信息的方式找到这样的模式,即,在没有关于类别的信息的情况下。相应地,这些方法被称为有监督和无监督ML。ML的第三种类型是强化学习,它试图找到一系列有助于实现特定目标的行动。所有这些方法在生物医学研究的许多不同领域正变得越来越流行,包括药物设计、患者分层、医学图像分析、分子相互作用、治疗结果预测等等。我们描述了几种有监督和无监督ML技术,并使用最先进的计算方法说明了一系列典型示例。鉴于强化学习的复杂性,这里不详细讨论,相反,感兴趣的读者可参考关于该主题的优秀综述。我们关注概念而非过程,因为我们的目标是吸引生物医学研究人员关注众多强大的ML方法及其利用基础和应用研究项目的潜力。