Yunnan Minzu University, Kunming, China.
School of Informatics, Xiamen University, Xiamen, China.
Comput Intell Neurosci. 2022 Jan 28;2022:8464452. doi: 10.1155/2022/8464452. eCollection 2022.
Deep learning has brought a rapid development in the aspect of molecular representation for various tasks, such as molecular property prediction. The prediction of molecular properties is a crucial task in the field of drug discovery for finding specific drugs with good pharmacological activity and pharmacokinetic properties. SMILES string is always used as a kind of character approach in deep neural network models, inspired by natural language processing techniques. However, the deep learning models are hindered by the nonunique nature of the SMILES string. To efficiently learn molecular features along all message paths, in this paper we encode multiple SMILES for every molecule as an automated data augmentation for the prediction of molecular properties, which alleviates the overfitting problem caused by the small amount of data in the datasets of molecular property prediction. As a result, by using the multiple SMILES-based augmentation, we obtained better molecular representation and showed superior performance in the tasks of predicting molecular properties.
深度学习在分子表示方面带来了快速发展,可应用于各种任务,如分子性质预测。分子性质预测是药物发现领域的一项关键任务,旨在寻找具有良好药理活性和药代动力学性质的特定药物。SMILES 字符串一直被用作深度学习模型中的一种字符方法,受到自然语言处理技术的启发。然而,由于 SMILES 字符串的非唯一性,深度学习模型受到了阻碍。为了有效地学习分子特征沿着所有消息路径,在本文中,我们对每个分子的多个 SMILES 进行编码,作为一种自动数据扩充,用于分子性质的预测,从而缓解了分子性质预测数据集中小数据量引起的过拟合问题。结果表明,通过使用基于多个 SMILES 的扩充,我们得到了更好的分子表示,并在分子性质预测任务中表现出了优异的性能。