• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

化学信息学的民主化:使用自动化的KNIME工作流程进行可解释的化学分组

Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow.

作者信息

Moreira-Filho José T, Ranganath Dhruv, Conway Mike, Schmitt Charles, Kleinstreuer Nicole, Mansouri Kamel

机构信息

National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.

University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.

出版信息

J Cheminform. 2024 Aug 16;16(1):101. doi: 10.1186/s13321-024-00894-1.

DOI:10.1186/s13321-024-00894-1
PMID:39152469
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11330086/
Abstract

With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.

摘要

随着公共数据库中化学数据可用性的增加,出现了用于分析、探索、可视化和从这些数据中提取信息的创新技术和算法。一种这样的技术是化学分组,其中具有共同特征的化学物质根据物理化学性质、用途、生物活性或它们的组合被分类到不同的组中。然而,现有的化学分组工具通常需要专业的编程技能或使用商业软件包。为了应对这些挑战,我们开发了一个在KNIME(一个免费、开源、低代码/无代码的数据分析平台)中实现的用户友好型化学分组工作流程。该工作流程是一个全面的工具,巧妙地整合了一系列过程,如分子描述符计算、特征选择、降维、超参数搜索以及监督和无监督机器学习方法,能够实现有效的化学分组并可视化结果。此外,我们还实现了用于解释的工具,识别化学组的关键分子描述符,并使用自然语言摘要来阐明这些分组背后的原理。该工作流程设计为可以在KNIME本地桌面版本和作为Web应用程序的KNIME Server WebPortal中无缝运行。它包含交互式界面和指南,以逐步协助用户。我们通过使用眼刺激和腐蚀数据集的案例研究展示了此工作流程的实用性。

科学贡献

这项工作在KNIME中提出了一种新颖、全面的化学分组工作流程,通过集成一个用户友好的图形界面提高了可访问性,该界面无需广泛的编程技能。此工作流程独特地结合了多个功能,如自动分子描述符计算、特征选择、降维以及机器学习算法(监督和无监督),并通过超参数优化来提高化学分组的准确性。此外,我们引入了创新的解释步骤和自然语言摘要来阐明化学分组的潜在原因,显著提高了该工具的可用性和结果的可解释性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/d34249e7af26/13321_2024_894_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/2265c8b76995/13321_2024_894_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/46339854be9c/13321_2024_894_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/5d713b376b86/13321_2024_894_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/026493bb7401/13321_2024_894_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/0f2ae80040a3/13321_2024_894_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/4208fd9524ab/13321_2024_894_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/cb407f84725c/13321_2024_894_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/e68d706c02ee/13321_2024_894_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/2c16184815ed/13321_2024_894_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/64c2da56f816/13321_2024_894_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/d34249e7af26/13321_2024_894_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/2265c8b76995/13321_2024_894_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/46339854be9c/13321_2024_894_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/5d713b376b86/13321_2024_894_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/026493bb7401/13321_2024_894_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/0f2ae80040a3/13321_2024_894_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/4208fd9524ab/13321_2024_894_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/cb407f84725c/13321_2024_894_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/e68d706c02ee/13321_2024_894_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/2c16184815ed/13321_2024_894_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/64c2da56f816/13321_2024_894_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f46/11330086/d34249e7af26/13321_2024_894_Fig11_HTML.jpg

相似文献

1
Democratizing cheminformatics: interpretable chemical grouping using an automated KNIME workflow.化学信息学的民主化:使用自动化的KNIME工作流程进行可解释的化学分组
J Cheminform. 2024 Aug 16;16(1):101. doi: 10.1186/s13321-024-00894-1.
2
Integration of the ImageJ Ecosystem in the KNIME Analytics Platform.将ImageJ生态系统集成到KNIME分析平台中。
Front Comput Sci. 2020 Mar;2. doi: 10.3389/fcomp.2020.00008. Epub 2020 Mar 17.
3
KNIME-CDK: Workflow-driven cheminformatics.KNIME-CDK:基于工作流的化学信息学。
BMC Bioinformatics. 2013 Aug 22;14:257. doi: 10.1186/1471-2105-14-257.
4
Automated Workflows for Data Curation and Machine Learning to Develop Quantitative Structure-Activity Relationships.用于数据管理和机器学习的自动化工作流程以开发定量结构-活性关系。
Methods Mol Biol. 2025;2834:115-130. doi: 10.1007/978-1-0716-4003-6_5.
5
chemalot and chemalot_knime: Command line programs as workflow tools for drug discovery.Chemalot和Chemalot_Knime:作为药物发现工作流程工具的命令行程序。
J Cheminform. 2017 Jun 12;9(1):38. doi: 10.1186/s13321-017-0228-9.
6
Enalos Suite of Tools: Enhancing Cheminformatics and Nanoinfor - matics through KNIME.Enalos 工具套件:通过 KNIME 增强化学信息学和纳米信息学。
Curr Med Chem. 2020;27(38):6523-6535. doi: 10.2174/0929867327666200727114410.
7
TeachOpenCADD-KNIME: A Teaching Platform for Computer-Aided Drug Design Using KNIME Workflows.TeachOpenCADD-KNIME:一个使用 KNIME 工作流的计算机辅助药物设计教学平台。
J Chem Inf Model. 2019 Oct 28;59(10):4083-4086. doi: 10.1021/acs.jcim.9b00662. Epub 2019 Oct 15.
8
Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling.用于化学结构自动标准化以支持定量构效关系建模的免费开源且适用于定量构效关系的工作流程。
J Cheminform. 2024 Feb 20;16(1):19. doi: 10.1186/s13321-024-00814-3.
9
Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.化学信息学中的可视化分析:用于定量构效关系方法的用户监督描述符选择
J Cheminform. 2015 Aug 19;7:39. doi: 10.1186/s13321-015-0092-4. eCollection 2015.
10
KNIME for Open-Source Bioimage Analysis: A Tutorial.用于开源生物图像分析的KNIME:教程
Adv Anat Embryol Cell Biol. 2016;219:179-97. doi: 10.1007/978-3-319-28549-8_7.

引用本文的文献

1
Smart Formulation: AI-Driven Web Platform for Optimization and Stability Prediction of Compounded Pharmaceuticals Using KNIME.智能配方:使用KNIME的用于复方药物优化和稳定性预测的人工智能驱动网络平台。
Pharmaceuticals (Basel). 2025 Aug 21;18(8):1240. doi: 10.3390/ph18081240.
2
PAPreC: A Pipeline for Antigenicity Prediction Comparison Methods across Bacteria.PAPreC:一种用于比较细菌抗原性预测方法的流程
ACS Omega. 2025 Feb 3;10(6):5415-5429. doi: 10.1021/acsomega.4c07147. eCollection 2025 Feb 18.
3
A Novel Machine Learning Model and a Web Portal for Predicting the Human Skin Sensitization Effects of Chemical Agents.

本文引用的文献

1
Unlocking the Potential of Clustering and Classification Approaches: Navigating Supervised and Unsupervised Chemical Similarity.解锁聚类和分类方法的潜力:探索有监督和无监督的化学相似性。
Environ Health Perspect. 2024 Aug;132(8):85002. doi: 10.1289/EHP14001. Epub 2024 Aug 6.
2
Free and open-source QSAR-ready workflow for automated standardization of chemical structures in support of QSAR modeling.用于化学结构自动标准化以支持定量构效关系建模的免费开源且适用于定量构效关系的工作流程。
J Cheminform. 2024 Feb 20;16(1):19. doi: 10.1186/s13321-024-00814-3.
3
Open-Source Machine Learning in Computational Chemistry.
一种用于预测化学试剂对人体皮肤致敏作用的新型机器学习模型及网络门户。
Toxics. 2024 Nov 7;12(11):803. doi: 10.3390/toxics12110803.
开源机器学习在计算化学中的应用。
J Chem Inf Model. 2023 Aug 14;63(15):4505-4532. doi: 10.1021/acs.jcim.3c00643. Epub 2023 Jul 19.
4
High Throughput Read-Across for Screening a Large Inventory of Related Structures by Balancing Artificial Intelligence/Machine Learning and Human Knowledge.高通量读交叉筛选大量相关结构的方法,通过平衡人工智能/机器学习和人类知识。
Chem Res Toxicol. 2023 Jul 17;36(7):1081-1106. doi: 10.1021/acs.chemrestox.3c00062. Epub 2023 Jul 3.
5
Computational approaches streamlining drug discovery.计算方法简化药物发现。
Nature. 2023 Apr;616(7958):673-685. doi: 10.1038/s41586-023-05905-z. Epub 2023 Apr 26.
6
On the Best Way to Cluster NCI-60 Molecules.基于 NCI-60 分子的聚类最佳方法。
Biomolecules. 2023 Mar 8;13(3):498. doi: 10.3390/biom13030498.
7
Artificial intelligence for drug discovery: Resources, methods, and applications.用于药物发现的人工智能:资源、方法及应用
Mol Ther Nucleic Acids. 2023 Feb 18;31:691-702. doi: 10.1016/j.omtn.2023.02.019. eCollection 2023 Mar 14.
8
Unsupervised machine learning methods and emerging applications in healthcare.无监督机器学习方法及其在医疗保健中的新兴应用。
Knee Surg Sports Traumatol Arthrosc. 2023 Feb;31(2):376-381. doi: 10.1007/s00167-022-07233-7. Epub 2022 Nov 15.
9
Lessons Learned from the Grouping of Chemicals to Assess Risks to Human Health.从化学物质分组评估人类健康风险中吸取的教训。
Angew Chem Int Ed Engl. 2023 May 22;62(22):e202210651. doi: 10.1002/anie.202210651. Epub 2023 Mar 20.
10
A comparison of explainable artificial intelligence methods in the phase classification of multi-principal element alloys.多主元合金相分类中可解释人工智能方法的比较。
Sci Rep. 2022 Jul 8;12(1):11591. doi: 10.1038/s41598-022-15618-4.