Department of Biochemistry and Molecular Biology II, Institute of Nutrition and Food Technology "José Mataix", Center of Biomedical Research, University of Granada, Granada, Spain.
Instituto de Investigación Biosanitaria ibs.GRANADA, Granada, Spain.
PLoS Comput Biol. 2020 Apr 10;16(4):e1007792. doi: 10.1371/journal.pcbi.1007792. eCollection 2020 Apr.
Until date, several machine learning approaches have been proposed for the dynamic modeling of temporal omics data. Although they have yielded impressive results in terms of model accuracy and predictive ability, most of these applications are based on "Black-box" algorithms and more interpretable models have been claimed by the research community. The recent eXplainable Artificial Intelligence (XAI) revolution offers a solution for this issue, were rule-based approaches are highly suitable for explanatory purposes. The further integration of the data mining process along with functional-annotation and pathway analyses is an additional way towards more explanatory and biologically soundness models. In this paper, we present a novel rule-based XAI strategy (including pre-processing, knowledge-extraction and functional validation) for finding biologically relevant sequential patterns from longitudinal human gene expression data (GED). To illustrate the performance of our pipeline, we work on in vivo temporal GED collected within the course of a long-term dietary intervention in 57 subjects with obesity (GSE77962). As validation populations, we employ three independent datasets following the same experimental design. As a result, we validate primarily extracted gene patterns and prove the goodness of our strategy for the mining of biologically relevant gene-gene temporal relations. Our whole pipeline has been gathered under open-source software and could be easily extended to other human temporal GED applications.
迄今为止,已经提出了几种机器学习方法来对时间组学数据进行动态建模。尽管这些方法在模型准确性和预测能力方面取得了令人印象深刻的结果,但大多数这些应用都是基于“黑盒”算法,研究界声称需要更具可解释性的模型。最近的可解释人工智能 (XAI) 革命为解决这个问题提供了一个解决方案,其中基于规则的方法非常适合解释目的。进一步将数据挖掘过程与功能注释和途径分析相结合,是构建更具解释性和生物学合理性模型的另一种方法。在本文中,我们提出了一种新的基于规则的 XAI 策略(包括预处理、知识提取和功能验证),用于从纵向人类基因表达数据 (GED) 中找到具有生物学意义的序列模式。为了说明我们的管道的性能,我们在 57 名肥胖患者(GSE77962)进行的长期饮食干预过程中收集的体内时间 GED 上进行了工作。作为验证人群,我们使用了三个遵循相同实验设计的独立数据集。结果,我们主要验证了提取的基因模式,并证明了我们的策略对于挖掘具有生物学意义的基因-基因时间关系的有效性。我们的整个管道都是在开源软件下收集的,可以很容易地扩展到其他人类时间 GED 应用。