使用Jupyter笔记本重新训练机器学习模型。

Using Jupyter Notebooks for re-training machine learning models.

作者信息

Smajić Aljoša, Grandits Melanie, Ecker Gerhard F

机构信息

Department of Pharmaceutical Sciences, University of Vienna, Vienna, Austria.

出版信息

J Cheminform. 2022 Aug 13;14(1):54. doi: 10.1186/s13321-022-00635-2.

DOI:10.1186/s13321-022-00635-2

PMID:35964049

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9375336/

Abstract

Machine learning (ML) models require an extensive, user-driven selection of molecular descriptors in order to learn from chemical structures to predict actives and inactives with a high reliability. In addition, privacy concerns often restrict the access to sufficient data, leading to models with a narrow chemical space. Therefore, we propose a framework of re-trainable models that can be transferred from one local instance to another, and further allow a less extensive descriptor selection. The models are shared via a Jupyter Notebook, allowing the evaluation and implementation of a broader chemical space by keeping most of the tunable parameters pre-defined. This enables the models to be updated in a decentralized, facile, and fast manner. Herein, the method was evaluated with six transporter datasets (BCRP, BSEP, OATP1B1, OATP1B3, MRP3, P-gp), which revealed the general applicability of this approach.

摘要

机器学习（ML）模型需要用户驱动广泛选择分子描述符，以便从化学结构中学习，从而高度可靠地预测活性和非活性物质。此外，隐私问题常常限制对足够数据的访问，导致模型的化学空间狭窄。因此，我们提出了一个可重新训练模型的框架，该框架可以从一个本地实例转移到另一个本地实例，并进一步允许进行不太广泛的描述符选择。这些模型通过Jupyter Notebook共享，通过预先定义大多数可调参数，允许对更广泛的化学空间进行评估和实施。这使得模型能够以分散、简便和快速的方式进行更新。在此，该方法用六个转运体数据集（BCRP、BSEP、OATP1B1、OATP1B3、MRP3、P-gp）进行了评估，结果表明了该方法的普遍适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c435/9375336/cb2f581cf033/13321_2022_635_Fig1_HTML.jpg

相似文献

Using Jupyter Notebooks for re-training machine learning models.使用Jupyter笔记本重新训练机器学习模型。

J Cheminform. 2022 Aug 13;14(1):54. doi: 10.1186/s13321-022-00635-2.

Appyters: Turning Jupyter Notebooks into data-driven web apps.Appyters：将Jupyter笔记本转变为数据驱动的网络应用程序。

Patterns (N Y). 2021 Mar 4;2(3):100213. doi: 10.1016/j.patter.2021.100213. eCollection 2021 Mar 12.

Enhancing Learning About Epidemiological Data Analysis Using R for Graduate Students in Medical Fields With Jupyter Notebook: Classroom Action Research.使用Jupyter Notebook促进医学领域研究生运用R进行流行病学数据分析的学习：课堂行动研究

JMIR Med Educ. 2023 May 29;9:e47394. doi: 10.2196/47394.

KETOS: Clinical decision support and machine learning as a service - A training and deployment platform based on Docker, OMOP-CDM, and FHIR Web Services.KETOS：临床决策支持和机器学习即服务 - 基于 Docker、OMOP-CDM 和 FHIR Web Services 的培训和部署平台。

PLoS One. 2019 Oct 3;14(10):e0223010. doi: 10.1371/journal.pone.0223010. eCollection 2019.

Computational reproducibility of Jupyter notebooks from biomedical publications.生物医学出版物中 Jupyter 笔记本的计算可重复性。

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giad113.

Direct Comparison of the Prediction of the Unbound Brain-to-Plasma Partitioning Utilizing Machine Learning Approach and Mechanistic Neuropharmacokinetic Model.直接比较利用机器学习方法和机制神经药代动力学模型预测未结合的脑-血浆分配。

AAPS J. 2021 May 18;23(4):72. doi: 10.1208/s12248-021-00604-x.

Identification of Transporters Involved in Beraprost Sodium Transport In Vitro.体外参与贝拉前列素钠转运的转运体的鉴定

Eur J Drug Metab Pharmacokinet. 2017 Feb;42(1):117-128. doi: 10.1007/s13318-016-0327-4.

Analyzing biological models and data sets using Jupyter notebooks as an alternate to laboratory-based exercises during COVID-19.在 COVID-19 期间，使用 Jupyter 笔记本分析生物模型和数据集，作为基于实验室的练习的替代方法。

Biochem Mol Biol Educ. 2020 Sep;48(5):532-534. doi: 10.1002/bmb.21443. Epub 2020 Sep 10.

Understanding and improving the quality and reproducibility of Jupyter notebooks.理解并提高Jupyter笔记本的质量和可重复性。

Empir Softw Eng. 2021;26(4):65. doi: 10.1007/s10664-021-09961-9. Epub 2021 May 8.

Teaching Python programming for bioinformatics with Jupyter notebook in the Post-COVID-19 era.后新冠疫情时代使用Jupyter笔记本进行生物信息学的Python编程教学

Biochem Mol Biol Educ. 2023 Sep-Oct;51(5):537-539. doi: 10.1002/bmb.21746. Epub 2023 May 18.

引用本文的文献

Machine Learning Prediction of Quantum Yields and Wavelengths of Aggregation-Induced Emission Molecules.聚集诱导发光分子量子产率和波长的机器学习预测

Materials (Basel). 2024 Apr 4;17(7):1664. doi: 10.3390/ma17071664.

A journey from molecule to physiology and tools for drug discovery targeting the transient receptor potential vanilloid type 1 (TRPV1) channel.从分子到生理学的历程以及靶向瞬时受体电位香草酸亚型1（TRPV1）通道的药物发现工具。

Front Pharmacol. 2024 Jan 24;14:1251061. doi: 10.3389/fphar.2023.1251061. eCollection 2023.

Machine Learning Techniques Applied to the Study of Drug Transporters.机器学习技术在药物转运体研究中的应用。

Molecules. 2023 Aug 8;28(16):5936. doi: 10.3390/molecules28165936.

Computational Prediction of Inhibitors and Inducers of the Major Isoforms of Cytochrome P450.计算预测细胞色素 P450 主要同工酶的抑制剂和诱导剂。

Molecules. 2022 Sep 10;27(18):5875. doi: 10.3390/molecules27185875.

本文引用的文献

PubChem in 2021: new data content and improved web interfaces.PubChem 在 2021 年：新增数据内容和改进的网络界面。

Nucleic Acids Res. 2021 Jan 8;49(D1):D1388-D1395. doi: 10.1093/nar/gkaa971.

Vienna LiverTox Workspace-A Set of Machine Learning Models for Prediction of Interactions Profiles of Small Molecules With Transporters Relevant for Regulatory Agencies.维也纳肝脏毒理学工作区——一套用于预测小分子与监管机构相关转运蛋白相互作用谱的机器学习模型。

Front Chem. 2020 Jan 10;7:899. doi: 10.3389/fchem.2019.00899. eCollection 2019.

An Overview of Machine Learning and Big Data for Drug Toxicity Evaluation.用于药物毒性评估的机器学习与大数据概述

Chem Res Toxicol. 2020 Jan 21;33(1):20-37. doi: 10.1021/acs.chemrestox.9b00227. Epub 2019 Nov 22.

TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data.TeachOpenCADD：一个使用开源软件包和数据进行计算机辅助药物设计的教学平台。

J Cheminform. 2019 Apr 8;11(1):29. doi: 10.1186/s13321-019-0351-x.

Machine Learning in Drug Discovery.药物研发中的机器学习

J Chem Inf Model. 2019 Mar 25;59(3):945-946. doi: 10.1021/acs.jcim.9b00136.

A review on machine learning methods for in silico toxicity prediction.计算机模拟毒性预测的机器学习方法综述。

J Environ Sci Health C Environ Carcinog Ecotoxicol Rev. 2018;36(4):169-191. doi: 10.1080/10590501.2018.1537118. Epub 2019 Jan 10.

Development of an Infrastructure for the Prediction of Biological Endpoints in Industrial Environments. Lessons Learned at the eTOX Project.工业环境中生物终点预测基础设施的开发。eTOX项目的经验教训。

Front Pharmacol. 2018 Oct 11;9:1147. doi: 10.3389/fphar.2018.01147. eCollection 2018.

Prediction of Chemical Toxicity for Drug Design Using Machine Learning Methods and Structural Alerts.使用机器学习方法和结构警示进行药物设计的化学毒性预测

Front Chem. 2018 Feb 20;6:30. doi: 10.3389/fchem.2018.00030. eCollection 2018.

Structure based classification for bile salt export pump (BSEP) inhibitors using comparative structural modeling of human BSEP.利用人胆汁盐输出泵（BSEP）的比较结构建模对胆汁盐输出泵（BSEP）抑制剂进行基于结构的分类。

J Comput Aided Mol Des. 2017 Jun;31(6):507-521. doi: 10.1007/s10822-017-0021-x. Epub 2017 May 19.

The ChEMBL database in 2017.2017年的ChEMBL数据库。

Nucleic Acids Res. 2017 Jan 4;45(D1):D945-D954. doi: 10.1093/nar/gkw1074. Epub 2016 Nov 28.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用Jupyter笔记本重新训练机器学习模型。

Using Jupyter Notebooks for re-training machine learning models.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献