• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用部分标记的噪声学生自训练和自监督图嵌入探索化学空间。

Exploration of chemical space with partial labeled noisy student self-training and self-supervised graph embedding.

机构信息

Department of Computer Science, Hunter College, The City University of New York, 695 Park Ave, New York, NY, 10065, USA.

The Graduate Center, The City University of New York, 356 5th Ave, New York, NY, 10016, USA.

出版信息

BMC Bioinformatics. 2022 May 2;23(Suppl 3):158. doi: 10.1186/s12859-022-04681-3.

DOI:10.1186/s12859-022-04681-3
PMID:35501680
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9063120/
Abstract

BACKGROUND

Drug discovery is time-consuming and costly. Machine learning, especially deep learning, shows great potential in quantitative structure-activity relationship (QSAR) modeling to accelerate drug discovery process and reduce its cost. A big challenge in developing robust and generalizable deep learning models for QSAR is the lack of a large amount of data with high-quality and balanced labels. To address this challenge, we developed a self-training method, Partially LAbeled Noisy Student (PLANS), and a novel self-supervised graph embedding, Graph-Isomorphism-Network Fingerprint (GINFP), for chemical compounds representations with substructure information using unlabeled data. The representations can be used for predicting chemical properties such as binding affinity, toxicity, and others. PLANS-GINFP allows us to exploit millions of unlabeled chemical compounds as well as labeled and partially labeled pharmacological data to improve the generalizability of neural network models.

RESULTS

We evaluated the performance of PLANS-GINFP for predicting Cytochrome P450 (CYP450) binding activity in a CYP450 dataset and chemical toxicity in the Tox21 dataset. The extensive benchmark studies demonstrated that PLANS-GINFP could significantly improve the performance in both cases by a large margin. Both PLANS-based self-training and GINFP-based self-supervised learning contribute to the performance improvement.

CONCLUSION

To better exploit chemical structures as an input for machine learning algorithms, we proposed a self-supervised graph neural network-based embedding method that can encode substructure information. Furthermore, we developed a model agnostic self-training method, PLANS, that can be applied to any deep learning architectures to improve prediction accuracies. PLANS provided a way to better utilize partially labeled and unlabeled data. Comprehensive benchmark studies demonstrated their potentials in predicting drug metabolism and toxicity profiles using sparse, noisy, and imbalanced data. PLANS-GINFP could serve as a general solution to improve the predictive modeling for QSAR modeling.

摘要

背景

药物发现是一个耗时且昂贵的过程。机器学习,尤其是深度学习,在定量构效关系(QSAR)建模方面显示出了巨大的潜力,可以加速药物发现过程并降低成本。为 QSAR 开发稳健且可推广的深度学习模型的一个主要挑战是缺乏具有高质量和平衡标签的大量数据。为了解决这个挑战,我们开发了一种自训练方法,即部分标记有噪声的学生(PLANS),以及一种新的自监督图嵌入方法,即图同构网络指纹(GINFP),用于具有子结构信息的化学化合物表示,可以使用无标签数据来预测结合亲和力、毒性等化学性质。PLANS-GINFP 允许我们利用数百万个未标记的化学化合物以及标记和部分标记的药理学数据来提高神经网络模型的泛化能力。

结果

我们评估了 PLANS-GINFP 在 CYP450 数据集和 Tox21 数据集的细胞色素 P450(CYP450)结合活性和化学毒性预测方面的性能。广泛的基准研究表明,PLANS-GINFP 可以在这两种情况下显著提高性能,且提高幅度很大。基于 PLANS 的自训练和基于 GINFP 的自监督学习都有助于提高性能。

结论

为了更好地将化学结构作为机器学习算法的输入,我们提出了一种基于自监督图神经网络的嵌入方法,可以编码子结构信息。此外,我们开发了一种模型不可知的自训练方法 PLANS,可以应用于任何深度学习架构,以提高预测精度。PLANS 提供了一种更好地利用部分标记和未标记数据的方法。综合基准研究表明,它们在使用稀疏、嘈杂和不平衡的数据预测药物代谢和毒性特征方面具有潜力。PLANS-GINFP 可以作为一种通用解决方案,用于提高 QSAR 建模的预测建模能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/842a534d36dd/12859_2022_4681_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/de3512f69c0a/12859_2022_4681_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/2ccbca4dc69e/12859_2022_4681_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/600f27b881c2/12859_2022_4681_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/3b820ed71110/12859_2022_4681_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/ad9ba5f46ccc/12859_2022_4681_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/ccce7628e0af/12859_2022_4681_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/842a534d36dd/12859_2022_4681_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/de3512f69c0a/12859_2022_4681_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/2ccbca4dc69e/12859_2022_4681_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/600f27b881c2/12859_2022_4681_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/3b820ed71110/12859_2022_4681_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/ad9ba5f46ccc/12859_2022_4681_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/ccce7628e0af/12859_2022_4681_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0d3/9063120/842a534d36dd/12859_2022_4681_Fig7_HTML.jpg

相似文献

1
Exploration of chemical space with partial labeled noisy student self-training and self-supervised graph embedding.利用部分标记的噪声学生自训练和自监督图嵌入探索化学空间。
BMC Bioinformatics. 2022 May 2;23(Suppl 3):158. doi: 10.1186/s12859-022-04681-3.
2
Deep semi-supervised learning via dynamic anchor graph embedding in latent space.基于潜在空间动态锚图嵌入的深度半监督学习。
Neural Netw. 2022 Feb;146:350-360. doi: 10.1016/j.neunet.2021.11.026. Epub 2021 Dec 1.
3
An effective self-supervised framework for learning expressive molecular global representations to drug discovery.用于药物发现的学习表达性分子全局表示的有效自监督框架。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab109.
4
Artificial intelligence to deep learning: machine intelligence approach for drug discovery.人工智能到深度学习:药物发现的机器智能方法。
Mol Divers. 2021 Aug;25(3):1315-1360. doi: 10.1007/s11030-021-10217-3. Epub 2021 Apr 12.
5
A unified deep semi-supervised graph learning scheme based on nodes re-weighting and manifold regularization.一种基于节点重新加权和流形正则化的统一深度半监督图学习方案。
Neural Netw. 2023 Jan;158:188-196. doi: 10.1016/j.neunet.2022.11.017. Epub 2022 Nov 19.
6
Efficient Combination of CNN and Transformer for Dual-Teacher Uncertainty-guided Semi-supervised Medical Image Segmentation.基于 CNN 和 Transformer 的高效组合用于双教师不确定性引导的半监督医学图像分割。
Comput Methods Programs Biomed. 2022 Nov;226:107099. doi: 10.1016/j.cmpb.2022.107099. Epub 2022 Sep 2.
7
Graph-Based Self-Training for Semi-Supervised Deep Similarity Learning.基于图的自训练在半监督深度相似性学习中的应用。
Sensors (Basel). 2023 Apr 13;23(8):3944. doi: 10.3390/s23083944.
8
Deep virtual adversarial self-training with consistency regularization for semi-supervised medical image classification.深度对偶对抗自训练与一致性正则化在半监督医学图像分类中的应用。
Med Image Anal. 2021 May;70:102010. doi: 10.1016/j.media.2021.102010. Epub 2021 Feb 22.
9
Robust Semi-Supervised Traffic Sign Recognition via Self-Training and Weakly-Supervised Learning.基于自训练和弱监督学习的鲁棒半监督交通标志识别。
Sensors (Basel). 2020 May 8;20(9):2684. doi: 10.3390/s20092684.
10
Self-Supervised Feature Learning and Phenotyping for Assessing Age-Related Macular Degeneration Using Retinal Fundus Images.使用视网膜眼底图像评估年龄相关性黄斑变性的自监督特征学习和表型分析。
Ophthalmol Retina. 2022 Feb;6(2):116-129. doi: 10.1016/j.oret.2021.06.010. Epub 2021 Jul 2.

引用本文的文献

1
E-GuARD: expert-guided augmentation for the robust detection of compounds interfering with biological assays.E-GuARD:用于可靠检测干扰生物测定的化合物的专家指导增强方法
J Cheminform. 2025 Apr 29;17(1):64. doi: 10.1186/s13321-025-01014-3.
2
Towards automatic farrowing monitoring-A Noisy Student approach for improving detection performance of newborn piglets.迈向自动分娩监测——一种用于提高新生仔猪检测性能的噪声学生方法
PLoS One. 2024 Oct 2;19(10):e0310818. doi: 10.1371/journal.pone.0310818. eCollection 2024.
3
Semi-supervised meta-learning elucidates understudied molecular interactions.

本文引用的文献

1
COVID-19 Multi-Targeted Drug Repurposing Using Few-Shot Learning.利用少样本学习进行COVID-19多靶点药物重新利用
Front Bioinform. 2021 Jun 15;1:693177. doi: 10.3389/fbinf.2021.693177. eCollection 2021.
2
LaplaceNet: A Hybrid Graph-Energy Neural Network for Deep Semisupervised Classification.拉普拉斯网络:一种用于深度半监督分类的混合图能量神经网络。
IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):5306-5318. doi: 10.1109/TNNLS.2022.3203315. Epub 2024 Apr 4.
3
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.
半监督元学习阐明了研究不足的分子相互作用。
Commun Biol. 2024 Sep 9;7(1):1104. doi: 10.1038/s42003-024-06797-z.
4
Hierarchical multi-omics data integration and modeling predict cell-specific chemical proteomics and drug responses.层次化多组学数据整合和建模预测细胞特异性化学蛋白质组学和药物反应。
Cell Rep Methods. 2023 Apr 17;3(4):100452. doi: 10.1016/j.crmeth.2023.100452. eCollection 2023 Apr 24.
5
End-to-end sequence-structure-function meta-learning predicts genome-wide chemical-protein interactions for dark proteins.端到端序列-结构-功能元学习预测全基因组化学-蛋白质相互作用的暗蛋白质。
PLoS Comput Biol. 2023 Jan 18;19(1):e1010851. doi: 10.1371/journal.pcbi.1010851. eCollection 2023 Jan.
基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
4
A Comprehensive Survey on Graph Neural Networks.图神经网络综述。
IEEE Trans Neural Netw Learn Syst. 2021 Jan;32(1):4-24. doi: 10.1109/TNNLS.2020.2978386. Epub 2021 Jan 4.
5
ChEMBL: towards direct deposition of bioassay data.ChEMBL:致力于直接生成生物测定数据。
Nucleic Acids Res. 2019 Jan 8;47(D1):D930-D940. doi: 10.1093/nar/gky1075.
6
MoleculeNet: a benchmark for molecular machine learning.分子网络:分子机器学习的一个基准
Chem Sci. 2017 Oct 31;9(2):513-530. doi: 10.1039/c7sc02664a. eCollection 2018 Jan 14.
7
The US Federal Tox21 Program: A strategic and operational plan for continued leadership.美国联邦毒物学计划 21:持续领导的战略和行动计划。
ALTEX. 2018;35(2):163-168. doi: 10.14573/altex.1803011. Epub 2018 Mar 8.
8
The rise of deep learning in drug discovery.深度学习在药物发现中的崛起。
Drug Discov Today. 2018 Jun;23(6):1241-1250. doi: 10.1016/j.drudis.2018.01.039. Epub 2018 Jan 31.
9
Basic review of the cytochrome p450 system.细胞色素P450系统基础综述。
J Adv Pract Oncol. 2013 Jul;4(4):263-8. doi: 10.6004/jadpro.2013.4.4.7.
10
Scaffold hopping.骨架跃迁
Drug Discov Today Technol. 2004 Dec;1(3):217-24. doi: 10.1016/j.ddtec.2004.10.009.