• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

化合物特征矩阵预测,第二部分:基于不同数量训练数据的多任务深度学习和随机森林分类的相对性能

Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data.

作者信息

Rodríguez-Pérez Raquel, Bajorath Jürgen

机构信息

Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.

Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397 Biberach/Riß, Germany.

出版信息

ACS Omega. 2018 Sep 30;3(9):12033-12040. doi: 10.1021/acsomega.8b01682. Epub 2018 Sep 27.

DOI:10.1021/acsomega.8b01682
PMID:30320286
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6175492/
Abstract

Currently, there is a high level of interest in deep learning and multitask learning in many scientific fields including the life sciences and chemistry. Herein, we investigate the performance of multitask deep neural networks (MT-DNNs) compared to random forest (RF) classification, a standard method in machine learning, in predicting compound profiling experiments. Predictions were carried out on a large profiling matrix extracted from biological screening data. For model building, submatrices with varying data density of 5-100% were generated to investigate the influence of data sparseness on prediction performance. MT-DNN models were directly compared to RF models, and control calculations were also carried out using single-task DNNs (ST-DNNs). On the basis of compound recall, the performance of ST-DNN was consistently lower than that of the other methods. Compared to RF, MT-DNN models only yielded better prediction performance for individual assays in the profiling matrix when training data were very sparse. However, when the matrix density increased to at least 25-45%, per-assay RF models met or partly exceeded the prediction performance of MT-DNN models. When the average performances of RF and MT-DNN over the grid of all targets were compared, MT-DNN was slightly superior to RF, which was a likely consequence of multitask learning. Overall, there was no consistent advantage of MT-DNN over standard RF classification in predicting the results of compound profiling assays under varying conditions. In the presence of very sparse training data, prediction performance was limited. Under these challenging conditions, MT-DNN was the preferred approach. When more training data became available and prediction performance increased, RF performance was not inferior to MT-DNN.

摘要

目前,深度学习和多任务学习在包括生命科学和化学在内的许多科学领域都备受关注。在此,我们研究了多任务深度神经网络(MT-DNN)与随机森林(RF)分类(机器学习中的一种标准方法)在预测化合物分析实验方面的性能。预测是在从生物筛选数据中提取的一个大型分析矩阵上进行的。为了构建模型,生成了数据密度在5%至100%之间变化的子矩阵,以研究数据稀疏性对预测性能的影响。将MT-DNN模型与RF模型直接进行比较,并且还使用单任务DNN(ST-DNN)进行了对照计算。基于化合物召回率,ST-DNN的性能始终低于其他方法。与RF相比,只有当训练数据非常稀疏时,MT-DNN模型在分析矩阵中的个别测定中才会产生更好的预测性能。然而,当矩阵密度增加到至少25%至45%时,每个测定的RF模型达到或部分超过了MT-DNN模型的预测性能。当比较RF和MT-DNN在所有目标网格上的平均性能时,MT-DNN略优于RF,这可能是多任务学习的结果。总体而言,在不同条件下预测化合物分析实验结果时,MT-DNN相对于标准RF分类并没有始终如一的优势。在训练数据非常稀疏的情况下,预测性能有限。在这些具有挑战性的条件下,MT-DNN是首选方法。当有更多训练数据可用且预测性能提高时,RF的性能并不逊色于MT-DNN。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/88d26c1dab4d/ao-2018-016824_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/60322100fa35/ao-2018-016824_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/ef0b284cc3ac/ao-2018-016824_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/486e5e58b800/ao-2018-016824_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/38d8fda13a84/ao-2018-016824_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/88d26c1dab4d/ao-2018-016824_0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/60322100fa35/ao-2018-016824_0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/ef0b284cc3ac/ao-2018-016824_0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/486e5e58b800/ao-2018-016824_0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/38d8fda13a84/ao-2018-016824_0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed7a/6646255/88d26c1dab4d/ao-2018-016824_0006.jpg

相似文献

1
Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data.化合物特征矩阵预测,第二部分:基于不同数量训练数据的多任务深度学习和随机森林分类的相对性能
ACS Omega. 2018 Sep 30;3(9):12033-12040. doi: 10.1021/acsomega.8b01682. Epub 2018 Sep 27.
2
Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions.评估多靶点深度神经网络模型在不断增加挑战性测试条件下的化合物效力预测能力。
J Comput Aided Mol Des. 2021 Mar;35(3):285-295. doi: 10.1007/s10822-021-00376-8. Epub 2021 Feb 17.
3
Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships.揭开用于定量构效关系的多任务深度神经网络的神秘面纱。
J Chem Inf Model. 2017 Oct 23;57(10):2490-2504. doi: 10.1021/acs.jcim.7b00087. Epub 2017 Oct 2.
4
Prediction of Human Cytochrome P450 Inhibition Using a Multitask Deep Autoencoder Neural Network.利用多任务深度自动编码器神经网络预测人细胞色素 P450 抑制作用。
Mol Pharm. 2018 Oct 1;15(10):4336-4345. doi: 10.1021/acs.molpharmaceut.8b00110. Epub 2018 May 30.
5
Predictive Multitask Deep Neural Network Models for ADME-Tox Properties: Learning from Large Data Sets.用于 ADME-Tox 性质的预测性多任务深度神经网络模型:从大数据集学习。
J Chem Inf Model. 2019 Mar 25;59(3):1253-1268. doi: 10.1021/acs.jcim.8b00785. Epub 2019 Jan 24.
6
Developing and comparing deep learning and machine learning algorithms for osteoporosis risk prediction.开发并比较用于骨质疏松症风险预测的深度学习和机器学习算法。
Front Artif Intell. 2024 Jun 11;7:1355287. doi: 10.3389/frai.2024.1355287. eCollection 2024.
7
Spatial Assessment of Solar Radiation by Machine Learning and Deep Neural Network Models Using Data Provided by the COMS MI Geostationary Satellite: A Case Study in South Korea.利用通信海洋气象卫星(COMS MI)静止卫星提供的数据,通过机器学习和深度神经网络模型对太阳辐射进行空间评估:以韩国为例的研究。
Sensors (Basel). 2019 May 5;19(9):2082. doi: 10.3390/s19092082.
8
Predicting drug-target interaction network using deep learning model.利用深度学习模型预测药物-靶标相互作用网络。
Comput Biol Chem. 2019 Jun;80:90-101. doi: 10.1016/j.compbiolchem.2019.03.016. Epub 2019 Mar 25.
9
Autoencoder and restricted Boltzmann machine for transfer learning in functional magnetic resonance imaging task classification.用于功能磁共振成像任务分类中迁移学习的自动编码器和受限玻尔兹曼机
Heliyon. 2023 Jul 16;9(7):e18086. doi: 10.1016/j.heliyon.2023.e18086. eCollection 2023 Jul.
10
DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network.DNN-Dom:通过深度神经网络仅从序列预测蛋白质结构域边界。
Bioinformatics. 2019 Dec 15;35(24):5128-5136. doi: 10.1093/bioinformatics/btz464.

引用本文的文献

1
Application of machine learning models for property prediction to targeted protein degraders.机器学习模型在靶向蛋白降解剂性质预测中的应用。
Nat Commun. 2024 Jul 9;15(1):5764. doi: 10.1038/s41467-024-49979-3.
2
Utilization of a Low-Cost Sensor Array for Mobile Methane Monitoring.利用低成本传感器阵列进行移动甲烷监测。
Sensors (Basel). 2024 Jan 14;24(2):519. doi: 10.3390/s24020519.
3
Machine Learning Toxicity Prediction: Latest Advances by Toxicity End Point.机器学习毒性预测:按毒性终点划分的最新进展

本文引用的文献

1
Prediction of Compound Profiling Matrices Using Machine Learning.使用机器学习预测化合物分析矩阵
ACS Omega. 2018 Apr 30;3(4):4713-4723. doi: 10.1021/acsomega.8b00462.
2
Extracting Compound Profiling Matrices from Screening Data.从筛选数据中提取化合物分析矩阵。
ACS Omega. 2018 Apr 30;3(4):4706-4712. doi: 10.1021/acsomega.8b00461.
3
Effect of missing data on multitask prediction methods.缺失数据对多任务预测方法的影响。
ACS Omega. 2022 Dec 13;7(51):47536-47546. doi: 10.1021/acsomega.2c05693. eCollection 2022 Dec 27.
4
Feature Genes in Neuroblastoma Distinguishing High-Risk and Non-high-Risk Neuroblastoma Patients: Development and Validation Combining Random Forest With Artificial Neural Network.区分高危和非高危神经母细胞瘤患者的神经母细胞瘤特征基因:结合随机森林与人工神经网络的开发与验证
Front Med (Lausanne). 2022 Jul 15;9:882348. doi: 10.3389/fmed.2022.882348. eCollection 2022.
5
Recent Advances in In Silico Target Fishing.计算机辅助药物靶点发现的最新进展
Molecules. 2021 Aug 24;26(17):5124. doi: 10.3390/molecules26175124.
6
Evaluation of multi-target deep neural network models for compound potency prediction under increasingly challenging test conditions.评估多靶点深度神经网络模型在不断增加挑战性测试条件下的化合物效力预测能力。
J Comput Aided Mol Des. 2021 Mar;35(3):285-295. doi: 10.1007/s10822-021-00376-8. Epub 2021 Feb 17.
7
Applicability Domain of Active Learning in Chemical Probe Identification: Convergence in Learning from Non-Specific Compounds and Decision Rule Clarification.主动学习在化学探针识别中的适用性领域:从非特异性化合物中学习的收敛性和决策规则的阐明。
Molecules. 2019 Jul 26;24(15):2716. doi: 10.3390/molecules24152716.
8
Opportunities and challenges using artificial intelligence in ADME/Tox.人工智能在 ADME/Tox 中的机遇与挑战。
Nat Mater. 2019 May;18(5):418-422. doi: 10.1038/s41563-019-0332-5.
9
Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets.基于细胞的 HIV 和逆转录酶数据集的多种机器学习比较。
Mol Pharm. 2019 Apr 1;16(4):1620-1632. doi: 10.1021/acs.molpharmaceut.8b01297. Epub 2019 Feb 26.
J Cheminform. 2018 May 22;10(1):26. doi: 10.1186/s13321-018-0281-z.
4
The rise of deep learning in drug discovery.深度学习在药物发现中的崛起。
Drug Discov Today. 2018 Jun;23(6):1241-1250. doi: 10.1016/j.drudis.2018.01.039. Epub 2018 Jan 31.
5
Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships.揭开用于定量构效关系的多任务深度神经网络的神秘面纱。
J Chem Inf Model. 2017 Oct 23;57(10):2490-2504. doi: 10.1021/acs.jcim.7b00087. Epub 2017 Oct 2.
6
Is Multitask Deep Learning Practical for Pharma?多任务深度学习对制药行业是否实用?
J Chem Inf Model. 2017 Aug 28;57(8):2068-2076. doi: 10.1021/acs.jcim.7b00146. Epub 2017 Aug 1.
7
The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening.正负比例和筛选数据库大小对基于机器学习的虚拟筛选性能的影响。
PLoS One. 2017 Apr 6;12(4):e0175410. doi: 10.1371/journal.pone.0175410. eCollection 2017.
8
Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds.不同训练集组成和大小对基于支持向量机的活性化合物预测的影响。
J Chem Inf Model. 2017 Apr 24;57(4):710-716. doi: 10.1021/acs.jcim.7b00088. Epub 2017 Apr 10.
9
Active learning for computational chemogenomics.计算化学生物基因组学的主动学习。
Future Med Chem. 2017 Mar;9(4):381-402. doi: 10.4155/fmc-2016-0197. Epub 2017 Mar 6.
10
Deep Learning in Drug Discovery.药物研发中的深度学习
Mol Inform. 2016 Jan;35(1):3-14. doi: 10.1002/minf.201501008. Epub 2015 Dec 30.