Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
Department of Statistics, University of California, Davis, CA, 95616, USA.
BMC Bioinformatics. 2022 Aug 5;23(1):321. doi: 10.1186/s12859-022-04864-y.
Applying directed acyclic graph (DAG) models to proteogenomic data has been shown effective for detecting causal biomarkers of complex diseases. However, there remain unsolved challenges in DAG learning to jointly model binary clinical outcome variables and continuous biomarker measurements.
In this paper, we propose a new tool, DAGBagM, to learn DAGs with both continuous and binary nodes. By using appropriate models, DAGBagM allows for either continuous or binary nodes to be parent or child nodes. It employs a bootstrap aggregating strategy to reduce false positives in edge inference. At the same time, the aggregation procedure provides a flexible framework to robustly incorporate prior information on edges.
Through extensive simulation experiments, we demonstrate that DAGBagM has superior performance compared to alternative strategies for modeling mixed types of nodes. In addition, DAGBagM is computationally more efficient than two competing methods. When applying DAGBagM to proteogenomic datasets from ovarian cancer studies, we identify potential protein biomarkers for platinum refractory/resistant response in ovarian cancer. DAGBagM is made available as a github repository at https://github.com/jie108/dagbagM .
将有向无环图(DAG)模型应用于蛋白质基因组数据已被证明对检测复杂疾病的因果生物标志物有效。然而,在联合建模二进制临床结局变量和连续生物标志物测量的 DAG 学习中仍存在未解决的挑战。
在本文中,我们提出了一种新工具 DAGBagM,用于学习具有连续和二进制节点的 DAG。通过使用适当的模型,DAGBagM 允许连续或二进制节点作为父节点或子节点。它采用自举聚合策略来减少边缘推断中的假阳性。同时,聚合过程提供了一个灵活的框架,可以稳健地纳入关于边缘的先验信息。
通过广泛的模拟实验,我们证明了 DAGBagM 在对混合类型节点进行建模方面的性能优于替代策略。此外,DAGBagM 在计算上比两种竞争方法更有效。当将 DAGBagM 应用于卵巢癌研究中的蛋白质基因组数据集时,我们确定了卵巢癌中铂类耐药/难治性反应的潜在蛋白质生物标志物。DAGBagM 可在 github 存储库 https://github.com/jie108/dagbagM 中获得。