Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, New York, 10012, USA.
Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Pl, New York, 10003, USA.
BMC Bioinformatics. 2023 Mar 24;24(1):114. doi: 10.1186/s12859-023-05231-1.
This study evaluates both a variety of existing base causal inference methods and a variety of ensemble methods. We show that: (i) base network inference methods vary in their performance across different datasets, so a method that works poorly on one dataset may work well on another; (ii) a non-homogeneous ensemble method in the form of a Naive Bayes classifier leads overall to as good or better results than using the best single base method or any other ensemble method; (iii) for the best results, the ensemble method should integrate all methods that satisfy a statistical test of normality on training data. The resulting ensemble model EnsInfer easily integrates all kinds of RNA-seq data as well as new and existing inference methods. The paper categorizes and reviews state-of-the-art underlying methods, describes the EnsInfer ensemble approach in detail, and presents experimental results. The source code and data used will be made available to the community upon publication.
本研究评估了各种现有的基础因果推理方法和各种集成方法。我们表明:(i)基础网络推理方法在不同数据集上的性能存在差异,因此在一个数据集上表现不佳的方法可能在另一个数据集上表现良好;(ii)朴素贝叶斯分类器形式的非同质集成方法总体上产生的结果与使用最佳单一基础方法或任何其他集成方法一样好或更好;(iii)为了获得最佳结果,集成方法应该集成所有在训练数据上通过正态性统计检验的方法。由此产生的集成模型 EnsInfer 可以轻松集成各种 RNA-seq 数据以及新的和现有的推理方法。本文对最新的基础方法进行了分类和综述,详细描述了 EnsInfer 集成方法,并给出了实验结果。源代码和使用的数据将在发表后提供给社区。