Computer Engineering Department, Iran University of Science and Technology, Narmak, Tehran, Iran.
J Biomed Inform. 2019 Jan;89:41-55. doi: 10.1016/j.jbi.2018.10.004. Epub 2018 Oct 16.
One of the most important issues in predictive modeling is to determine major cause factors of a phenomenon and causal relationships between them. Extracting causal relationships between parameters in a natural phenomenon can be accomplished through checking the parameters' changes in consecutive events. In addition, using information and probabilistic theory help better conception of causal relationships of a phenomenon. Therefore, probabilistic causal discovery from sequential data of a natural phenomenon can be useful for dimension reduction and predicting the future trend of a process. In this paper, we introduce a novel method for causal discovery from a sequential data based on a probabilistic causal graph. In this method, first, Causal Feature Dependency matrix (CFD matrix) is generated based on the features' changes in consecutive events. Then, a probabilistic causal graph is created from CFD matrix. In this graph, some valueless features will be eliminated on the basis of entropy value of each conditional density function. Finally, prediction operation is performed based on the output of causal graph. Experimental results on the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) sequential data set from Amyotrophic Lateral Sclerosis (ALS) disease show that our proposed algorithm can predict the progression rate of ALS disease properly with high precision.
在预测建模中,最重要的问题之一是确定现象的主要原因因素及其之间的因果关系。通过检查自然现象中参数在连续事件中的变化,可以提取参数之间的因果关系。此外,利用信息和概率理论有助于更好地理解现象的因果关系。因此,从自然现象的序列数据中进行概率因果发现可用于降维和预测过程的未来趋势。在本文中,我们介绍了一种基于概率因果图从序列数据中进行因果发现的新方法。在该方法中,首先基于连续事件中特征的变化生成因果特征依赖矩阵 (CFD 矩阵)。然后,从 CFD 矩阵创建概率因果图。在该图中,根据每个条件密度函数的熵值,将消除一些无价值的特征。最后,基于因果图的输出执行预测操作。来自肌萎缩侧索硬化症 (ALS) 疾病的 Pooled Resource Open-Access ALS 临床试验 (PRO-ACT) 序列数据集的实验结果表明,我们提出的算法可以正确预测 ALS 疾病的进展速度,具有很高的精度。