用于生物分子、生物物理和生物材料研究的机器学习方法。

Machine learning approaches for biomolecular, biophysical, and biomaterials research.

作者信息

Rickert Carolin A, Lieleg Oliver

出版信息

Biophys Rev (Melville). 2022 Jun 3;3(2):021306. doi: 10.1063/5.0082179. eCollection 2022 Jun.

DOI:10.1063/5.0082179

PMID:38505413

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10914139/

Abstract

A fluent conversation with a virtual assistant, person-tailored news feeds, and deep-fake images created within seconds-all those things that have been unthinkable for a long time are now a part of our everyday lives. What these examples have in common is that they are realized by different means of machine learning (ML), a technology that has fundamentally changed many aspects of the modern world. The possibility to process enormous amount of data in multi-hierarchical, digital constructs has paved the way not only for creating intelligent systems but also for obtaining surprising new insight into many scientific problems. However, in the different areas of biosciences, which typically rely heavily on the collection of time-consuming experimental data, applying ML methods is a bit more challenging: Here, difficulties can arise from small datasets and the inherent, broad variability, and complexity associated with studying biological objects and phenomena. In this Review, we give an overview of commonly used ML algorithms (which are often referred to as "machines") and learning strategies as well as their applications in different bio-disciplines such as molecular biology, drug development, biophysics, and biomaterials science. We highlight how selected research questions from those fields were successfully translated into machine readable formats, discuss typical problems that can arise in this context, and provide an overview of how to resolve those encountered difficulties.

摘要

与虚拟助手进行流畅对话、个性化的新闻推送以及在几秒钟内生成的深度伪造图像——所有这些长期以来都无法想象的事情，如今已成为我们日常生活的一部分。这些例子的共同之处在于，它们都是通过不同的机器学习（ML）方法实现的，这项技术从根本上改变了现代世界的许多方面。在多层次数字结构中处理海量数据的可能性，不仅为创建智能系统铺平了道路，也为深入了解许多科学问题带来了惊人的新见解。然而，在通常严重依赖耗时实验数据收集的生物科学不同领域，应用ML方法则更具挑战性：在这里，困难可能源于小数据集以及与研究生物对象和现象相关的内在广泛变异性和复杂性。在本综述中，我们概述了常用的ML算法（通常称为“机器”）和学习策略，以及它们在分子生物学、药物开发、生物物理学和生物材料科学等不同生物学科中的应用。我们强调了如何将这些领域中选定的研究问题成功转化为机器可读格式，讨论了在这种情况下可能出现的典型问题，并概述了如何解决所遇到的困难。