Suppr超能文献

综合比较用于预测建模的分子特征表示。

A comprehensive comparison of molecular feature representations for use in predictive modeling.

机构信息

Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia; Jožef Stefan International Postgraduate School, Ljubljana, Slovenia.

The University of Auckland, School of Computer Science, Auckland, New Zealand.

出版信息

Comput Biol Med. 2021 Mar;130:104197. doi: 10.1016/j.compbiomed.2020.104197. Epub 2021 Jan 9.

Abstract

Machine learning methods are commonly used for predicting molecular properties to accelerate material and drug design. An important part of this process is deciding how to represent the molecules. Typically, machine learning methods expect examples represented by vectors of values, and many methods for calculating molecular feature representations have been proposed. In this paper, we perform a comprehensive comparison of different molecular features, including traditional methods such as fingerprints and molecular descriptors, and recently proposed learnable representations based on neural networks. Feature representations are evaluated on 11 benchmark datasets, used for predicting properties and measures such as mutagenicity, melting points, activity, solubility, and IC50. Our experiments show that several molecular features work similarly well over all benchmark datasets. The ones that stand out most are Spectrophores, which give significantly worse performance than other features on most datasets. Molecular descriptors from the PaDEL library seem very well suited for predicting physical properties of molecules. Despite their simplicity, MACCS fingerprints performed very well overall. The results show that learnable representations achieve competitive performance compared to expert based representations. However, task-specific representations (graph convolutions and Weave methods) rarely offer any benefits, even though they are computationally more demanding. Lastly, combining different molecular feature representations typically does not give a noticeable improvement in performance compared to individual feature representations.

摘要

机器学习方法常用于预测分子性质,以加速材料和药物设计。这个过程的一个重要部分是决定如何表示分子。通常,机器学习方法期望用数值向量表示的示例,并且已经提出了许多用于计算分子特征表示的方法。在本文中,我们对不同的分子特征进行了全面的比较,包括指纹和分子描述符等传统方法,以及最近基于神经网络的可学习表示方法。特征表示在 11 个基准数据集上进行了评估,用于预测性质和度量,如致突变性、熔点、活性、溶解度和 IC50。我们的实验表明,在所有基准数据集上,有几个分子特征的性能都非常相似。其中 Spectrophores 的性能在大多数数据集上都明显比其他特征差。来自 PaDEL 库的分子描述符似乎非常适合预测分子的物理性质。尽管它们很简单,但 MACCS 指纹的整体性能非常好。结果表明,与基于专家的表示相比,可学习的表示可以达到竞争性能。然而,特定于任务的表示(图卷积和 Weave 方法)很少提供任何好处,尽管它们的计算要求更高。最后,与单个特征表示相比,组合不同的分子特征表示通常不会显著提高性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验