IBM Research Europe, The Hartree Centre - Sci-Tech Daresbury, Keckwick Lane, Daresbury, Warrington WA4 4AD, United Kingdom.
Earlham Institute, Norwich Research Park, Colney Lane, Norwich NR4 7UZ.
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae593.
Machine learning (ML) methods offer opportunities for gaining insights into the intricate workings of complex biological systems, and their applications are increasingly prominent in the analysis of omics data to facilitate tasks, such as the identification of novel biomarkers and predictive modeling of phenotypes. For scientists and domain experts, leveraging user-friendly ML pipelines can be incredibly valuable, enabling them to run sophisticated, robust, and interpretable models without requiring in-depth expertise in coding or algorithmic optimization. By streamlining the process of model development and training, researchers can devote their time and energies to the critical tasks of biological interpretation and validation, thereby maximizing the scientific impact of ML-driven insights. Here, we present an entirely automated open-source explainable AI tool, AutoXAI4Omics, that performs classification and regression tasks from omics and tabular numerical data. AutoXAI4Omics accelerates scientific discovery by automating processes and decisions made by AI experts, e.g. selection of the best feature set, hyper-tuning of different ML algorithms and selection of the best ML model for a specific task and dataset. Prior to ML analysis AutoXAI4Omics incorporates feature filtering options that are tailored to specific omic data types. Moreover, the insights into the predictions that are provided by the tool through explainability analysis highlight associations between omic feature values and the targets under investigation, e.g. predicted phenotypes, facilitating the identification of novel actionable insights. AutoXAI4Omics is available at: https://github.com/IBM/AutoXAI4Omics.
机器学习(ML)方法为深入了解复杂生物系统的复杂工作原理提供了机会,其在分析组学数据以促进新生物标志物的识别和表型预测建模等任务中的应用越来越突出。对于科学家和领域专家来说,利用用户友好的 ML 管道可以非常有价值,使他们能够运行复杂、稳健和可解释的模型,而无需深入了解编码或算法优化的专业知识。通过简化模型开发和培训的过程,研究人员可以将时间和精力投入到生物解释和验证的关键任务中,从而最大限度地提高 ML 驱动的见解的科学影响力。在这里,我们提出了一个完全自动化的开源可解释 AI 工具 AutoXAI4Omics,它可以对组学和表格数值数据执行分类和回归任务。AutoXAI4Omics 通过自动化 AI 专家做出的决策和流程,例如最佳特征集的选择、不同 ML 算法的超调以及针对特定任务和数据集选择最佳 ML 模型,加速了科学发现。在进行 ML 分析之前,AutoXAI4Omics 结合了针对特定组学数据类型的特征过滤选项。此外,该工具通过可解释性分析提供的预测见解突出了组学特征值与目标之间的关联,例如预测表型,有助于识别新的可操作见解。AutoXAI4Omics 可在以下网址获取:https://github.com/IBM/AutoXAI4Omics。