• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

缺失数据对多任务预测方法的影响。

Effect of missing data on multitask prediction methods.

作者信息

de la Vega de León Antonio, Chen Beining, Gillet Valerie J

机构信息

Information School, University of Sheffield, Regent Court, 211 Portobello, Sheffield, S1 4DP, UK.

Department of Chemistry, University of Sheffield, Dainton Building, Brook Hill, Sheffield, S3 7HF, UK.

出版信息

J Cheminform. 2018 May 22;10(1):26. doi: 10.1186/s13321-018-0281-z.

DOI:10.1186/s13321-018-0281-z
PMID:29789977
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5964064/
Abstract

There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises.

摘要

随着深度学习网络在化学信息学领域的应用日益广泛,多任务预测受到了越来越多的关注。该技术应用于多靶点数据集,其中化合物已针对不同靶点进行测试,目的是开发模型以预测给定化合物的生物活性概况。然而,多靶点数据集往往很稀疏,即并非所有化合物 - 靶点组合都有实验值。关于缺失数据对多任务方法性能的影响,目前研究较少。我们使用了两个完整的数据集,通过从训练集中删除数据来模拟稀疏性。比较了不同的删除数据模型。这些稀疏集用于训练两种不同的多任务方法,即深度神经网络和澳门算法(一种贝叶斯概率矩阵分解技术)。两种方法的结果非常相似,表明由于数据缺失导致的性能下降起初较小,在大量数据被删除后加速。这项工作提供了一个初步近似,以评估在多任务预测练习中需要多少数据才能产生良好的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/b62f3892ef0d/13321_2018_281_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/33262b93202d/13321_2018_281_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/6d6984acfc5f/13321_2018_281_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/8407c939f44a/13321_2018_281_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/a7e71418c142/13321_2018_281_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/b62f3892ef0d/13321_2018_281_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/33262b93202d/13321_2018_281_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/6d6984acfc5f/13321_2018_281_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/8407c939f44a/13321_2018_281_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/a7e71418c142/13321_2018_281_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e784/5964064/b62f3892ef0d/13321_2018_281_Fig5_HTML.jpg

相似文献

1
Effect of missing data on multitask prediction methods.缺失数据对多任务预测方法的影响。
J Cheminform. 2018 May 22;10(1):26. doi: 10.1186/s13321-018-0281-z.
2
Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data.化合物特征矩阵预测,第二部分:基于不同数量训练数据的多任务深度学习和随机森林分类的相对性能
ACS Omega. 2018 Sep 30;3(9):12033-12040. doi: 10.1021/acsomega.8b01682. Epub 2018 Sep 27.
3
Multitask CapsNet: An Imbalanced Data Deep Learning Method for Predicting Toxicants.多任务胶囊网络:一种用于预测有毒物质的不平衡数据深度学习方法。
ACS Omega. 2021 Sep 29;6(40):26545-26555. doi: 10.1021/acsomega.1c03842. eCollection 2021 Oct 12.
4
Multitask Modeling with Confidence Using Matrix Factorization and Conformal Prediction.使用矩阵分解和一致性预测进行置信度下的多任务建模。
J Chem Inf Model. 2019 Apr 22;59(4):1598-1604. doi: 10.1021/acs.jcim.9b00027. Epub 2019 Apr 5.
5
Prediction of Human Cytochrome P450 Inhibition Using a Multitask Deep Autoencoder Neural Network.利用多任务深度自动编码器神经网络预测人细胞色素 P450 抑制作用。
Mol Pharm. 2018 Oct 1;15(10):4336-4345. doi: 10.1021/acs.molpharmaceut.8b00110. Epub 2018 May 30.
6
Large-Scale Modeling of Sparse Protein Kinase Activity Data.大规模稀疏蛋白激酶活性数据建模。
J Chem Inf Model. 2023 Jun 26;63(12):3688-3696. doi: 10.1021/acs.jcim.3c00132. Epub 2023 Jun 9.
7
Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships.揭开用于定量构效关系的多任务深度神经网络的神秘面纱。
J Chem Inf Model. 2017 Oct 23;57(10):2490-2504. doi: 10.1021/acs.jcim.7b00087. Epub 2017 Oct 2.
8
Using Deep Learning for Compound Selectivity Prediction.利用深度学习进行化合物选择性预测。
Curr Comput Aided Drug Des. 2016;12(1):5-14. doi: 10.2174/1573409912666160219113250.
9
A Multitask Approach to Learn Molecular Properties.一种学习分子性质的多任务方法。
J Chem Inf Model. 2021 Aug 23;61(8):3824-3834. doi: 10.1021/acs.jcim.1c00646. Epub 2021 Jul 21.
10
Is Multitask Deep Learning Practical for Pharma?多任务深度学习对制药行业是否实用?
J Chem Inf Model. 2017 Aug 28;57(8):2068-2076. doi: 10.1021/acs.jcim.7b00146. Epub 2017 Aug 1.

引用本文的文献

1
Modeling and Interpretability Study of the Structure-Activity Relationship for Multigeneration EGFR Inhibitors.多代表皮生长因子受体(EGFR)抑制剂构效关系的建模与可解释性研究
ACS Omega. 2025 Mar 14;10(11):11176-11187. doi: 10.1021/acsomega.4c10464. eCollection 2025 Mar 25.
2
QSPRpred: a Flexible Open-Source Quantitative Structure-Property Relationship Modelling Tool.QSPRpred:一个灵活的开源定量结构-性质关系建模工具。
J Cheminform. 2024 Nov 14;16(1):128. doi: 10.1186/s13321-024-00908-y.
3
MELLODDY: Cross-pharma Federated Learning at Unprecedented Scale Unlocks Benefits in QSAR without Compromising Proprietary Information.

本文引用的文献

1
Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships.揭开用于定量构效关系的多任务深度神经网络的神秘面纱。
J Chem Inf Model. 2017 Oct 23;57(10):2490-2504. doi: 10.1021/acs.jcim.7b00087. Epub 2017 Oct 2.
2
Is Multitask Deep Learning Practical for Pharma?多任务深度学习对制药行业是否实用?
J Chem Inf Model. 2017 Aug 28;57(8):2068-2076. doi: 10.1021/acs.jcim.7b00146. Epub 2017 Aug 1.
3
Opportunities and challenges in phenotypic drug discovery: an industry perspective.表型药物发现的机遇与挑战:行业视角。
美乐蒂:在前所未有的规模上进行跨制药公司联邦学习,在不损害专有信息的情况下,实现 QSAR 的优势。
J Chem Inf Model. 2024 Apr 8;64(7):2331-2344. doi: 10.1021/acs.jcim.3c00799. Epub 2023 Aug 29.
4
Large-Scale Modeling of Sparse Protein Kinase Activity Data.大规模稀疏蛋白激酶活性数据建模。
J Chem Inf Model. 2023 Jun 26;63(12):3688-3696. doi: 10.1021/acs.jcim.3c00132. Epub 2023 Jun 9.
5
Kinome-wide polypharmacology profiling of small molecules by multi-task graph isomorphism network approach.通过多任务图同构网络方法对小分子进行全激酶组多药理学分析。
Acta Pharm Sin B. 2023 Jan;13(1):54-67. doi: 10.1016/j.apsb.2022.05.004. Epub 2022 May 12.
6
Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point.机器学习毒性预测:按毒性终点划分的最新进展
ACS Omega. 2022 Dec 13;7(51):47536-47546. doi: 10.1021/acsomega.2c05693. eCollection 2022 Dec 27.
7
Analysis of the benefits of imputation models over traditional QSAR models for toxicity prediction.插补模型相对于传统定量构效关系(QSAR)模型在毒性预测方面的优势分析。
J Cheminform. 2022 Jun 7;14(1):32. doi: 10.1186/s13321-022-00611-w.
8
Don't Overweight Weights: Evaluation of Weighting Strategies for Multi-Task Bioactivity Classification Models.不要过度重视权重:多任务生物活性分类模型的权重策略评估。
Molecules. 2021 Nov 18;26(22):6959. doi: 10.3390/molecules26226959.
9
Recent Advances in In Silico Target Fishing.计算机辅助药物靶点发现的最新进展
Molecules. 2021 Aug 24;26(17):5124. doi: 10.3390/molecules26175124.
10
Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions.评估多靶点深度神经网络模型在不断增加挑战性测试条件下的化合物效力预测能力。
J Comput Aided Mol Des. 2021 Mar;35(3):285-295. doi: 10.1007/s10822-021-00376-8. Epub 2021 Feb 17.
Nat Rev Drug Discov. 2017 Aug;16(8):531-543. doi: 10.1038/nrd.2017.111. Epub 2017 Jul 7.
4
Best Practices for QSAR Model Development, Validation, and Exploitation.定量构效关系(QSAR)模型开发、验证及应用的最佳实践
Mol Inform. 2010 Jul 12;29(6-7):476-88. doi: 10.1002/minf.201000061. Epub 2010 Jul 6.
5
bioassayR: Cross-Target Analysis of Small Molecule Bioactivity.生物测定R:小分子生物活性的交叉靶点分析。
J Chem Inf Model. 2016 Jul 25;56(7):1237-42. doi: 10.1021/acs.jcim.6b00109. Epub 2016 Jul 12.
6
Public Domain HTS Fingerprints: Design and Evaluation of Compound Bioactivity Profiles from PubChem's Bioassay Repository.公共领域高通量筛选指纹图谱:来自PubChem生物测定数据库的化合物生物活性谱的设计与评估
J Chem Inf Model. 2016 Feb 22;56(2):390-8. doi: 10.1021/acs.jcim.5b00498. Epub 2016 Jan 14.
7
Protein kinase profiling assays: a technology review.蛋白激酶分析检测:技术综述
Drug Discov Today Technol. 2015 Nov;18:1-8. doi: 10.1016/j.ddtec.2015.10.007. Epub 2015 Oct 31.
8
PubChem Substance and Compound databases.美国国立医学图书馆化学物质数据库和化合物数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D1202-13. doi: 10.1093/nar/gkv951. Epub 2015 Sep 22.
9
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
10
Deep neural nets as a method for quantitative structure-activity relationships.深度神经网络作为一种定量构效关系的方法。
J Chem Inf Model. 2015 Feb 23;55(2):263-74. doi: 10.1021/ci500747n. Epub 2015 Feb 17.