Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
ETH Singapore SEC Ltd, CREATE Way, #06-01 CREATE Tower, Singapore, Singapore.
Mol Inform. 2023 Jun;42(6):e2300059. doi: 10.1002/minf.202300059. Epub 2023 Jun 7.
Several binary molecular fingerprints were compressed using an autoencoder neural network. We analyzed the impact of compression on fingerprint performance in downstream classification and regression tasks. Classifiers trained on compressed fingerprints were negligibly affected. Regression models benefitted from compression, especially of long fingerprints (Morgan, RDK). However, their performance dropped rapidly for compression levels exceeding 90 %. Property co-learning positively influenced the predictive power of the compressed fingerprints, with a mean score improvement up to 20 %, suggesting that autoencoder compression with property co-learning biases the molecular representation toward the predicted target, facilitating downstream training.
几种双分子指纹图使用自动编码器神经网络进行了压缩。我们分析了压缩对下游分类和回归任务中指纹性能的影响。在压缩指纹上训练的分类器受到的影响可以忽略不计。回归模型受益于压缩,尤其是对长指纹(Morgan、RDK)。然而,当压缩水平超过 90%时,它们的性能会迅速下降。属性协同学习对压缩指纹的预测能力有积极影响,平均得分提高了 20%,这表明带有属性协同学习的自动编码器压缩使分子表示偏向于预测目标,从而促进了下游训练。