Ma Sisi, Tourani Roshan
Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, USA.
Entropy (Basel). 2024 Mar 2;26(3):228. doi: 10.3390/e26030228.
The knowledge of the causal mechanisms underlying one single system may not be sufficient to answer certain questions. One can gain additional insights from comparing and contrasting the causal mechanisms underlying multiple systems and uncovering consistent and distinct causal relationships. For example, discovering common molecular mechanisms among different diseases can lead to drug repurposing. The problem of comparing causal mechanisms among multiple systems is non-trivial, since the causal mechanisms are usually unknown and need to be estimated from data. If we estimate the causal mechanisms from data generated from different systems and directly compare them (the naive method), the result can be sub-optimal. This is especially true if the data generated by the different systems differ substantially with respect to their sample sizes. In this case, the quality of the estimated causal mechanisms for the different systems will differ, which can in turn affect the accuracy of the estimated similarities and differences among the systems via the naive method. To mitigate this problem, we introduced the bootstrap estimation and the equal sample size resampling estimation method for estimating the difference between causal networks. Both of these methods use resampling to assess the confidence of the estimation. We compared these methods with the naive method in a set of systematically simulated experimental conditions with a variety of network structures and sample sizes, and using different performance metrics. We also evaluated these methods on various real-world biomedical datasets covering a wide range of data designs.
仅了解单个系统背后的因果机制可能不足以回答某些问题。通过比较和对比多个系统背后的因果机制,并揭示一致和不同的因果关系,可以获得更多见解。例如,发现不同疾病之间的共同分子机制可导致药物重新利用。比较多个系统之间的因果机制这一问题并非易事,因为因果机制通常是未知的,需要从数据中进行估计。如果我们从不同系统生成的数据中估计因果机制并直接进行比较(朴素方法),结果可能不是最优的。如果不同系统生成的数据在样本大小方面存在很大差异,情况尤其如此。在这种情况下,不同系统的因果机制估计质量会有所不同,这反过来又会通过朴素方法影响系统之间估计的相似性和差异的准确性。为了缓解这个问题,我们引入了自举估计和等样本量重采样估计方法来估计因果网络之间的差异。这两种方法都使用重采样来评估估计的置信度。我们在一组具有各种网络结构和样本大小的系统模拟实验条件下,使用不同的性能指标,将这些方法与朴素方法进行了比较。我们还在涵盖广泛数据设计的各种真实世界生物医学数据集上评估了这些方法。