• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

量化转移学习中生物活性预测任务的难度。

Quantifying the Hardness of Bioactivity Prediction Tasks for Transfer Learning.

机构信息

Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria.

Christian Doppler Laboratory for Molecular Informatics in the Biosciences, Department for Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria.

出版信息

J Chem Inf Model. 2024 May 27;64(10):4031-4046. doi: 10.1021/acs.jcim.4c00160. Epub 2024 May 13.

DOI:10.1021/acs.jcim.4c00160
PMID:38739465
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11134514/
Abstract

Today, machine learning methods are widely employed in drug discovery. However, the chronic lack of data continues to hamper their further development, validation, and application. Several modern strategies aim to mitigate the challenges associated with data scarcity by learning from data on related tasks. These knowledge-sharing approaches encompass transfer learning, multitask learning, and meta-learning. A key question remaining to be answered for these approaches is about the extent to which their performance can benefit from the relatedness of available source (training) tasks; in other words, how difficult ("hard") a test task is to a model, given the available source tasks. This study introduces a new method for quantifying and predicting the hardness of a bioactivity prediction task based on its relation to the available training tasks. The approach involves the generation of protein and chemical representations and the calculation of distances between the bioactivity prediction task and the available training tasks. In the example of meta-learning on the FS-Mol data set, we demonstrate that the proposed task hardness metric is inversely correlated with performance (Pearson's correlation coefficient = -0.72). The metric will be useful in estimating the task-specific gain in performance that can be achieved through meta-learning.

摘要

如今,机器学习方法在药物发现中得到了广泛应用。然而,数据的长期缺乏仍然阻碍了它们的进一步发展、验证和应用。几种现代策略旨在通过从相关任务的数据中学习来减轻与数据稀缺相关的挑战。这些知识共享方法包括迁移学习、多任务学习和元学习。对于这些方法,仍然需要回答的一个关键问题是,它们的性能在多大程度上可以受益于可用源(训练)任务的相关性;换句话说,给定可用的源任务,模型对测试任务的难度(“困难”)如何。本研究提出了一种新的方法,用于根据生物活性预测任务与可用训练任务的关系来量化和预测该任务的难度。该方法涉及生成蛋白质和化学表示,并计算生物活性预测任务与可用训练任务之间的距离。在 FS-Mol 数据集上的元学习示例中,我们证明了所提出的任务难度度量与性能呈负相关(Pearson 相关系数 = -0.72)。该度量将有助于估计通过元学习可以实现的特定任务的性能增益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/213721b56313/ci4c00160_0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/331a5ea0e3ca/ci4c00160_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/8dc2c133630b/ci4c00160_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/7cb8f27cb0db/ci4c00160_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/b05f184d845e/ci4c00160_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/8aaeaf9048b6/ci4c00160_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/cb1c4ec58cce/ci4c00160_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/9a6edfaf3d0b/ci4c00160_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/d3975be7251a/ci4c00160_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/0f3123d2fbb3/ci4c00160_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/3cda5d186f63/ci4c00160_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/6b5da1116952/ci4c00160_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/d0043f6934eb/ci4c00160_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/e478676e6bab/ci4c00160_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/4fea7e05c98c/ci4c00160_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/fcbe292c8b87/ci4c00160_0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/43dcfa7578cc/ci4c00160_0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/37bc18526748/ci4c00160_0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/213721b56313/ci4c00160_0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/331a5ea0e3ca/ci4c00160_0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/8dc2c133630b/ci4c00160_0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/7cb8f27cb0db/ci4c00160_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/b05f184d845e/ci4c00160_0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/8aaeaf9048b6/ci4c00160_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/cb1c4ec58cce/ci4c00160_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/9a6edfaf3d0b/ci4c00160_0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/d3975be7251a/ci4c00160_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/0f3123d2fbb3/ci4c00160_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/3cda5d186f63/ci4c00160_0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/6b5da1116952/ci4c00160_0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/d0043f6934eb/ci4c00160_0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/e478676e6bab/ci4c00160_0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/4fea7e05c98c/ci4c00160_0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/fcbe292c8b87/ci4c00160_0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/43dcfa7578cc/ci4c00160_0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/37bc18526748/ci4c00160_0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe13/11134514/213721b56313/ci4c00160_0018.jpg

相似文献

1
Quantifying the Hardness of Bioactivity Prediction Tasks for Transfer Learning.量化转移学习中生物活性预测任务的难度。
J Chem Inf Model. 2024 May 27;64(10):4031-4046. doi: 10.1021/acs.jcim.4c00160. Epub 2024 May 13.
2
A novel multitask learning algorithm for tasks with distinct chemical space: zebrafish toxicity prediction as an example.一种用于具有不同化学空间任务的新型多任务学习算法:以斑马鱼毒性预测为例。
J Cheminform. 2024 Aug 2;16(1):91. doi: 10.1186/s13321-024-00891-4.
3
Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。
J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.
4
Don't Overweight Weights: Evaluation of Weighting Strategies for Multi-Task Bioactivity Classification Models.不要过度重视权重:多任务生物活性分类模型的权重策略评估。
Molecules. 2021 Nov 18;26(22):6959. doi: 10.3390/molecules26226959.
5
GeneralizedDTA: combining pre-training and multi-task learning to predict drug-target binding affinity for unknown drug discovery.通用 DTA:结合预训练和多任务学习,预测未知药物发现的药物-靶标结合亲和力。
BMC Bioinformatics. 2022 Sep 7;23(1):367. doi: 10.1186/s12859-022-04905-6.
6
A Hierarchical Multitask Learning Approach for the Recognition of Activities of Daily Living Using Data from Wearable Sensors.一种基于分层多任务学习的方法,用于使用可穿戴传感器数据识别日常生活活动。
Sensors (Basel). 2023 Oct 3;23(19):8234. doi: 10.3390/s23198234.
7
Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction.验证验证:重新分析深度学习和机器学习模型在生物活性预测方面的大规模比较。
J Comput Aided Mol Des. 2020 Jul;34(7):717-730. doi: 10.1007/s10822-019-00274-0. Epub 2020 Jan 20.
8
Mitigating cold-start problems in drug-target affinity prediction with interaction knowledge transferring.利用交互知识转移缓解药物-靶标亲和力预测中的冷启动问题。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac269.
9
GraphEGFR: Multi-task and transfer learning based on molecular graph attention mechanism and fingerprints improving inhibitor bioactivity prediction for EGFR family proteins on data scarcity.GraphEGFR:基于分子图注意力机制和指纹的多任务和迁移学习,在数据稀缺的情况下提高了针对 EGFR 家族蛋白的抑制剂生物活性预测。
J Comput Chem. 2024 Sep 5;45(23):2001-2023. doi: 10.1002/jcc.27388. Epub 2024 May 7.
10
Meta-learning-based Inductive logistic matrix completion for prediction of kinase inhibitors.基于元学习的归纳逻辑矩阵补全用于激酶抑制剂预测
J Cheminform. 2024 Apr 16;16(1):44. doi: 10.1186/s13321-024-00838-9.

本文引用的文献

1
The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.2023 年的 ChEMBL 数据库:一个涵盖多种生物活性数据类型和时间段的药物发现平台。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192. doi: 10.1093/nar/gkad1004.
2
Heterogeneous Network Representation Learning: A Unified Framework with Survey and Benchmark.异构网络表示学习:一个包含综述与基准测试的统一框架
IEEE Trans Knowl Data Eng. 2022 Oct;34(10):4854-4873. doi: 10.1109/tkde.2020.3045924. Epub 2020 Dec 21.
3
Transfer learning for drug-target interaction prediction.
药物-靶标相互作用预测的迁移学习。
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i103-i110. doi: 10.1093/bioinformatics/btad234.
4
Machine Learning Methods for Small Data Challenges in Molecular Science.机器学习方法在分子科学中小数据挑战中的应用。
Chem Rev. 2023 Jul 12;123(13):8736-8780. doi: 10.1021/acs.chemrev.3c00189. Epub 2023 Jun 29.
5
Evolutionary-scale prediction of atomic-level protein structure with a language model.用语言模型进行原子级蛋白质结构的进化尺度预测。
Science. 2023 Mar 17;379(6637):1123-1130. doi: 10.1126/science.ade2574. Epub 2023 Mar 16.
6
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
7
Improving molecular property prediction through a task similarity enhanced transfer learning strategy.通过任务相似性增强的迁移学习策略改进分子性质预测。
iScience. 2022 Sep 30;25(10):105231. doi: 10.1016/j.isci.2022.105231. eCollection 2022 Oct 21.
8
Learning meaningful representations of protein sequences.学习蛋白质序列有意义的表示方法。
Nat Commun. 2022 Apr 8;13(1):1914. doi: 10.1038/s41467-022-29443-w.
9
Graph neural network approaches for drug-target interactions.图神经网络方法在药物-靶标相互作用中的应用。
Curr Opin Struct Biol. 2022 Apr;73:102327. doi: 10.1016/j.sbi.2021.102327. Epub 2022 Jan 21.
10
Meta-Learning in Neural Networks: A Survey.元学习在神经网络中的研究进展综述
IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):5149-5169. doi: 10.1109/TPAMI.2021.3079209. Epub 2022 Aug 4.