Ferguson Andrew L
Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, 1304 West Green Street, Urbana, IL 61801, United States of America. Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, 600 South Mathews Avenue, Urbana, IL 61801, United States of America. Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, IL 61801, United States of America. Frederick Seitz Materials Research Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States of America. Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States of America.
J Phys Condens Matter. 2018 Jan 31;30(4):043002. doi: 10.1088/1361-648X/aa98bd.
In many branches of materials science it is now routine to generate data sets of such large size and dimensionality that conventional methods of analysis fail. Paradigms and tools from data science and machine learning can provide scalable approaches to identify and extract trends and patterns within voluminous data sets, perform guided traversals of high-dimensional phase spaces, and furnish data-driven strategies for inverse materials design. This topical review provides an accessible introduction to machine learning tools in the context of soft and biological materials by 'de-jargonizing' data science terminology, presenting a taxonomy of machine learning techniques, and surveying the mathematical underpinnings and software implementations of popular tools, including principal component analysis, independent component analysis, diffusion maps, support vector machines, and relative entropy. We present illustrative examples of machine learning applications in soft matter, including inverse design of self-assembling materials, nonlinear learning of protein folding landscapes, high-throughput antimicrobial peptide design, and data-driven materials design engines. We close with an outlook on the challenges and opportunities for the field.
在材料科学的许多分支中,如今生成如此大尺寸和维度的数据集已成为常规操作,以至于传统分析方法失效。数据科学和机器学习的范式与工具能够提供可扩展的方法,以识别和提取大量数据集中的趋势与模式,在高维相空间中进行有指导的遍历,并为逆材料设计提供数据驱动的策略。本专题综述通过消除数据科学术语的行话、呈现机器学习技术的分类法以及概述包括主成分分析、独立成分分析、扩散映射、支持向量机和相对熵在内的常用工具的数学基础和软件实现,对软材料和生物材料领域中的机器学习工具进行了通俗易懂的介绍。我们展示了机器学习在软物质中的应用示例,包括自组装材料的逆设计、蛋白质折叠景观的非线性学习、高通量抗菌肽设计以及数据驱动的材料设计引擎。我们以对该领域的挑战与机遇的展望作为结尾。