R.ROSETTA：一个可解释的机器学习框架。

R.ROSETTA: an interpretable machine learning framework.

机构信息

Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden.

Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden.

出版信息

BMC Bioinformatics. 2021 Mar 6;22(1):110. doi: 10.1186/s12859-021-04049-z.

DOI:10.1186/s12859-021-04049-z

PMID:33676405

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7937228/

Abstract

BACKGROUND

Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components.

RESULTS

We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes.

CONCLUSIONS

R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.

摘要

背景

机器学习涉及的策略和算法可协助生物信息学进行数据分析和知识发现。在许多应用中，例如生命科学领域，了解预测是如何得出的通常比知道做出了什么预测更为重要。为此，最近提倡使用可解释的机器学习。在这项研究中，我们实现了一个基于粗糙集理论的可解释机器学习包。我们工作的一个重要目标是提供模型及其组件的统计特性。

结果

我们展示了 R.ROSETTA 包，它是 ROSETTA 框架的 R 包装器。原始的 ROSETTA 函数已经过改进和适应 R 编程环境。该包允许构建和分析非线性可解释的机器学习模型。R.ROSETTA 通过基于规则的建模收集组合统计信息，以提供可访问和透明的结果，非常适合在更广泛的科学界中采用。该包还提供了统计和可视化工具，有助于最小化分析偏差和噪声。R.ROSETTA 包可在 https://github.com/komorowskilab/R.ROSETTA 上免费获得。为了说明该包的用法，我们将其应用于自闭症病例对照研究的转录组数据集。我们的工具提供了辨别表型类别的特征之间的潜在共同预测机制的假说。这些共同预测因子代表了神经发育和自闭症相关基因。

结论

R.ROSETTA 为可解释的机器学习分析和基于知识的系统提供了新的见解。我们表明，我们的包有助于检测与自闭症相关的基因的依赖性。尽管 R.ROSETTA 的示例应用说明了转录组数据分析，但该包可用于分析以决策表形式组织的任何数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/15c5/7937228/e3fbd36b98bb/12859_2021_4049_Fig1_HTML.jpg

相似文献

R.ROSETTA: an interpretable machine learning framework.R.ROSETTA：一个可解释的机器学习框架。

BMC Bioinformatics. 2021 Mar 6;22(1):110. doi: 10.1186/s12859-021-04049-z.

MIDGET:Detecting differential gene expression on microarray data.MIDGET：检测微阵列数据中的差异基因表达。

Comput Methods Programs Biomed. 2021 Nov;211:106418. doi: 10.1016/j.cmpb.2021.106418. Epub 2021 Sep 16.

Explainable AI for Bioinformatics: Methods, Tools and Applications.可解释人工智能在生物信息学中的应用：方法、工具与应用。

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad236.

Fast and interpretable genomic data analysis using multiple approximate kernel learning.使用多种近似核学习进行快速且可解释的基因组数据分析。

Bioinformatics. 2022 Jun 24;38(Suppl 1):i77-i83. doi: 10.1093/bioinformatics/btac241.

treeheatr: an R package for interpretable decision tree visualizations.treeheatr：一个用于可解释决策树可视化的 R 包。

Bioinformatics. 2021 Apr 19;37(2):282-284. doi: 10.1093/bioinformatics/btaa662.

SMILE: systems metabolomics using interpretable learning and evolution.SMILE：基于可解释学习和进化的系统代谢组学。

BMC Bioinformatics. 2021 May 28;22(1):284. doi: 10.1186/s12859-021-04209-1.

Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder.可解释的机器学习揭示了自闭症谱系障碍亚型之间的差异。

Front Genet. 2021 Feb 25;12:618277. doi: 10.3389/fgene.2021.618277. eCollection 2021.

fastJT: An R package for robust and efficient feature selection for machine learning and genome-wide association studies.fastJT：一个用于机器学习和全基因组关联研究的稳健、高效的特征选择的 R 包。

BMC Bioinformatics. 2019 Jun 13;20(1):333. doi: 10.1186/s12859-019-2869-3.

RRegrs: an R package for computer-aided model selection with multiple regression models.RRegrs：一个用于通过多元回归模型进行计算机辅助模型选择的R软件包。

J Cheminform. 2015 Sep 15;7:46. doi: 10.1186/s13321-015-0094-2. eCollection 2015.

dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning.dsMTL：用于隐私保护的分布式多任务机器学习的计算框架。

Bioinformatics. 2022 Oct 31;38(21):4919-4926. doi: 10.1093/bioinformatics/btac616.

引用本文的文献

Using machine learning methods to study the tumour microenvironment and its biomarkers in osteosarcoma metastasis.运用机器学习方法研究骨肉瘤转移中的肿瘤微环境及其生物标志物。

Heliyon. 2024 Apr 5;10(7):e29322. doi: 10.1016/j.heliyon.2024.e29322. eCollection 2024 Apr 15.

The role of artificial neural networks in prediction of severe acute pancreatitis associated acute respiratory distress syndrome: A retrospective study.人工神经网络在预测急性胰腺炎相关急性呼吸窘迫综合征中的作用：一项回顾性研究。

Medicine (Baltimore). 2023 Jul 21;102(29):e34399. doi: 10.1097/MD.0000000000034399.

Using Machine Learning Methods to Study Colorectal Cancer Tumor Micro-Environment and Its Biomarkers.利用机器学习方法研究结直肠癌肿瘤微环境及其生物标志物。

Int J Mol Sci. 2023 Jul 6;24(13):11133. doi: 10.3390/ijms241311133.

Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data.基于基因表达数据的可解释机器学习识别儿科系统性红斑狼疮亚型。

Sci Rep. 2022 May 6;12(1):7433. doi: 10.1038/s41598-022-10853-1.

Machine Learning-Based Analysis of Glioma Grades Reveals Co-Enrichment.基于机器学习的胶质瘤分级分析揭示了共富集。

Cancers (Basel). 2022 Feb 17;14(4):1014. doi: 10.3390/cancers14041014.

Transcriptomic analysis reveals proinflammatory signatures associated with acute myeloid leukemia progression.转录组分析揭示与急性髓系白血病进展相关的促炎特征。

Blood Adv. 2022 Jan 11;6(1):152-164. doi: 10.1182/bloodadvances.2021004962.

Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder.可解释的机器学习揭示了自闭症谱系障碍亚型之间的差异。

Front Genet. 2021 Feb 25;12:618277. doi: 10.3389/fgene.2021.618277. eCollection 2021.

本文引用的文献

Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.停止为高风险决策解释黑箱机器学习模型，转而使用可解释模型。

Nat Mach Intell. 2019 May;1(5):206-215. doi: 10.1038/s42256-019-0048-x. Epub 2019 May 13.

Opening the Black Box: Interpretable Machine Learning for Geneticists.打开黑箱：遗传学家的可解释机器学习。

Trends Genet. 2020 Jun;36(6):442-455. doi: 10.1016/j.tig.2020.03.005. Epub 2020 Apr 17.

Neurobiology and Therapeutic Potential of Cyclooxygenase-2 (COX-2) Inhibitors for Inflammation in Neuropsychiatric Disorders.环氧化酶-2（COX-2）抑制剂在神经精神疾病炎症中的神经生物学及治疗潜力

Front Psychiatry. 2019 Sep 4;10:605. doi: 10.3389/fpsyt.2019.00605. eCollection 2019.

Genes Brain Behav. 2019 Jan;18(1):e12506. doi: 10.1111/gbb.12506. Epub 2018 Aug 12.

Blood-brain barrier regulation in psychiatric disorders.精神疾病中的血脑屏障调节。

Neurosci Lett. 2020 May 1;726:133664. doi: 10.1016/j.neulet.2018.06.033. Epub 2018 Jun 30.

NCS-1 is a regulator of calcium signaling in health and disease.NCS-1 是健康和疾病中钙信号的调节剂。

Biochim Biophys Acta Mol Cell Res. 2018 Nov;1865(11 Pt B):1660-1667. doi: 10.1016/j.bbamcr.2018.05.005. Epub 2018 May 8.

Abnormalities in interactions of Rho GTPases with scaffolding proteins contribute to neurodevelopmental disorders.Rho GTPases 与支架蛋白相互作用的异常导致神经发育障碍。

J Neurosci Res. 2018 May;96(5):781-788. doi: 10.1002/jnr.24200. Epub 2017 Nov 23.

Gene selection for tumor classification using neighborhood rough sets and entropy measures.基于邻域粗糙集和熵测度的肿瘤分类基因选择

J Biomed Inform. 2017 Mar;67:59-68. doi: 10.1016/j.jbi.2017.02.007. Epub 2017 Feb 13.

Variation in Gene Expression in Autism Spectrum Disorders: An Extensive Review of Transcriptomic Studies.自闭症谱系障碍中的基因表达变异：转录组学研究的全面综述

Front Neurosci. 2017 Jan 5;10:601. doi: 10.3389/fnins.2016.00601. eCollection 2016.

Brain inflammation is accompanied by peripheral inflammation in Cstb mice, a model for progressive myoclonus epilepsy.在进行性肌阵挛癫痫模型Cstb小鼠中，脑炎症伴随着外周炎症。

J Neuroinflammation. 2016 Nov 28;13(1):298. doi: 10.1186/s12974-016-0764-7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

R.ROSETTA：一个可解释的机器学习框架。

R.ROSETTA: an interpretable machine learning framework.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献