Moreira-Filho José T, Ranganath Dhruv, Conway Mike, Schmitt Charles, Kleinstreuer Nicole, Mansouri Kamel
National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, Division of Translational Toxicology, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.
J Cheminform. 2024 Aug 16;16(1):101. doi: 10.1186/s13321-024-00894-1.
With the increased availability of chemical data in public databases, innovative techniques and algorithms have emerged for the analysis, exploration, visualization, and extraction of information from these data. One such technique is chemical grouping, where chemicals with common characteristics are categorized into distinct groups based on physicochemical properties, use, biological activity, or a combination. However, existing tools for chemical grouping often require specialized programming skills or the use of commercial software packages. To address these challenges, we developed a user-friendly chemical grouping workflow implemented in KNIME, a free, open-source, low/no-code, data analytics platform. The workflow serves as an all-encompassing tool, expertly incorporating a range of processes such as molecular descriptor calculation, feature selection, dimensionality reduction, hyperparameter search, and supervised and unsupervised machine learning methods, enabling effective chemical grouping and visualization of results. Furthermore, we implemented tools for interpretation, identifying key molecular descriptors for the chemical groups, and using natural language summaries to clarify the rationale behind these groupings. The workflow was designed to run seamlessly in both the KNIME local desktop version and KNIME Server WebPortal as a web application. It incorporates interactive interfaces and guides to assist users in a step-by-step manner. We demonstrate the utility of this workflow through a case study using an eye irritation and corrosion dataset.Scientific contributionsThis work presents a novel, comprehensive chemical grouping workflow in KNIME, enhancing accessibility by integrating a user-friendly graphical interface that eliminates the need for extensive programming skills. This workflow uniquely combines several features such as automated molecular descriptor calculation, feature selection, dimensionality reduction, and machine learning algorithms (both supervised and unsupervised), with hyperparameter optimization to refine chemical grouping accuracy. Moreover, we have introduced an innovative interpretative step and natural language summaries to elucidate the underlying reasons for chemical groupings, significantly advancing the usability of the tool and interpretability of the results.
随着公共数据库中化学数据可用性的增加,出现了用于分析、探索、可视化和从这些数据中提取信息的创新技术和算法。一种这样的技术是化学分组,其中具有共同特征的化学物质根据物理化学性质、用途、生物活性或它们的组合被分类到不同的组中。然而,现有的化学分组工具通常需要专业的编程技能或使用商业软件包。为了应对这些挑战,我们开发了一个在KNIME(一个免费、开源、低代码/无代码的数据分析平台)中实现的用户友好型化学分组工作流程。该工作流程是一个全面的工具,巧妙地整合了一系列过程,如分子描述符计算、特征选择、降维、超参数搜索以及监督和无监督机器学习方法,能够实现有效的化学分组并可视化结果。此外,我们还实现了用于解释的工具,识别化学组的关键分子描述符,并使用自然语言摘要来阐明这些分组背后的原理。该工作流程设计为可以在KNIME本地桌面版本和作为Web应用程序的KNIME Server WebPortal中无缝运行。它包含交互式界面和指南,以逐步协助用户。我们通过使用眼刺激和腐蚀数据集的案例研究展示了此工作流程的实用性。
科学贡献
这项工作在KNIME中提出了一种新颖、全面的化学分组工作流程,通过集成一个用户友好的图形界面提高了可访问性,该界面无需广泛的编程技能。此工作流程独特地结合了多个功能,如自动分子描述符计算、特征选择、降维以及机器学习算法(监督和无监督),并通过超参数优化来提高化学分组的准确性。此外,我们引入了创新的解释步骤和自然语言摘要来阐明化学分组的潜在原因,显著提高了该工具的可用性和结果的可解释性。