School of Computing, Queen's University, Kingston, ON, Canada.
Digital Technologies Research Center, National Research Council Canada, Ottawa, ON, Canada.
BMC Bioinformatics. 2021 May 28;22(1):284. doi: 10.1186/s12859-021-04209-1.
Direct link between metabolism and cell and organism phenotype in health and disease makes metabolomics, a high throughput study of small molecular metabolites, an essential methodology for understanding and diagnosing disease development and progression. Machine learning methods have seen increasing adoptions in metabolomics thanks to their powerful prediction abilities. However, the "black-box" nature of many machine learning models remains a major challenge for wide acceptance and utility as it makes the interpretation of decision process difficult. This challenge is particularly predominant in biomedical research where understanding of the underlying decision making mechanism is essential for insuring safety and gaining new knowledge.
In this article, we proposed a novel computational framework, Systems Metabolomics using Interpretable Learning and Evolution (SMILE), for supervised metabolomics data analysis. Our methodology uses an evolutionary algorithm to learn interpretable predictive models and to identify the most influential metabolites and their interactions in association with disease. Moreover, we have developed a web application with a graphical user interface that can be used for easy analysis, interpretation and visualization of the results. Performance of the method and utilization of the web interface is shown using metabolomics data for Alzheimer's disease.
SMILE was able to identify several influential metabolites on AD and to provide interpretable predictive models that can be further used for a better understanding of the metabolic background of AD. SMILE addresses the emerging issue of interpretability and explainability in machine learning, and contributes to more transparent and powerful applications of machine learning in bioinformatics.
代谢与健康和疾病中的细胞和机体表型之间的直接联系,使得代谢组学成为理解和诊断疾病发展和进展的必要方法,它是一种对小分子代谢物进行高通量研究的方法。由于其强大的预测能力,机器学习方法在代谢组学中得到了越来越多的应用。然而,许多机器学习模型的“黑箱”性质仍然是广泛接受和应用的主要挑战,因为这使得决策过程的解释变得困难。在生物医学研究中,这一挑战尤为突出,因为理解潜在的决策机制对于确保安全性和获得新知识至关重要。
在本文中,我们提出了一种新的计算框架,即基于可解释学习和进化的系统代谢组学(SMILE),用于有监督的代谢组学数据分析。我们的方法使用进化算法来学习可解释的预测模型,并识别与疾病相关的最有影响的代谢物及其相互作用。此外,我们还开发了一个带有图形用户界面的网络应用程序,可用于轻松分析、解释和可视化结果。使用阿尔茨海默病的代谢组学数据展示了该方法的性能和网络界面的利用。
SMILE 能够识别出与 AD 相关的几个有影响的代谢物,并提供可解释的预测模型,可进一步用于更好地理解 AD 的代谢背景。SMILE 解决了机器学习中可解释性和可解释性的新兴问题,并为机器学习在生物信息学中的更透明和强大的应用做出了贡献。