Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC, 28262, USA.
Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28262, USA.
Sci Rep. 2021 Feb 19;11(1):4247. doi: 10.1038/s41598-021-83269-y.
Identifying mechanisms that control molecular function is a significant challenge in pharmaceutical science and molecular engineering. Here, we present a novel projection pursuit recurrent neural network to identify functional mechanisms in the context of iterative supervised machine learning for discovery-based design optimization. Molecular function recognition is achieved by pairing experiments that categorize systems with digital twin molecular dynamics simulations to generate working hypotheses. Feature extraction decomposes emergent properties of a system into a complete set of basis vectors. Feature selection requires signal-to-noise, statistical significance, and clustering quality to concurrently surpass acceptance levels. Formulated as a multivariate description of differences and similarities between systems, the data-driven working hypothesis is refined by analyzing new systems prioritized by a discovery-likelihood. Utility and generality are demonstrated on several benchmarks, including the elucidation of antibiotic resistance in TEM-52 beta-lactamase. The software is freely available, enabling turnkey analysis of massive data streams found in computational biology and material science.
确定控制分子功能的机制是药物科学和分子工程中的重大挑战。在这里,我们提出了一种新的投影寻踪递归神经网络,用于在基于发现的设计优化的迭代监督机器学习中识别功能机制。通过将实验与数字双胞胎分子动力学模拟配对,对系统进行分类,以生成工作假设,从而实现分子功能识别。特征提取将系统的突发属性分解为完整的基向量集。特征选择需要信号噪声、统计显著性和聚类质量同时超过接受水平。通过分析按发现可能性优先排序的新系统,将数据驱动的工作假设作为系统之间差异和相似性的多变量描述进行细化。该软件是免费提供的,可用于对计算生物学和材料科学中发现的大规模数据流进行一键式分析。