Department of Orthopedic Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea.
Institute of Biomedical AI, Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University School of Medicine, Seoul, Korea.
BMC Med Inform Decis Mak. 2022 Apr 27;22(1):113. doi: 10.1186/s12911-022-01852-3.
The recent explosion of cancer genomics provides extensive information about mutations and gene expression changes in cancer. However, most of the identified gene mutations are not clinically utilized. It remains uncertain whether the presence of a certain genetic alteration will affect treatment response. Conventional statistics have limitations for causal inferences and are hard to gain sufficient power in genomic datasets. Here, we developed and evaluated a C-search algorithm for searching the causal genes that maximize the effect of the treatment.
The algorithm was developed based on the potential outcome framework and Bayesian posterior update. The precision of the algorithm was validated using a simulation dataset. The algorithm was implemented to a cBioPortal dataset. The genes discovered by the algorithm were externally validated within CancerSCAN screening data from Samsung Medical Center.
Simulation data analysis showed that the C-search algorithm was able to identify nine causal genes out of ten. The C-search algorithm shows the discovery rate rapidly increasing until the 1500 data instances. Meanwhile, the log-rank test shows a slower increase in performance. The C-search algorithm was able to suggest nine causal genes from the cBioPortal Metabric dataset. Treating the patients with the causal genes is associated with better survival outcome in both the cBioPortal dataset and the CancerSCAN dataset which is used for external validation.
Our C-search algorithm demonstrated better performance to identify causal effects of the genes than multiple log-rank test analysis especially within a limited number of data. The result suggests that the C-search can discover the causal genes from various genetic datasets, where the number of samples is limited compared to the number of variables.
癌症基因组学的最新发展提供了广泛的关于癌症中突变和基因表达变化的信息。然而,大多数已确定的基因突变在临床上尚未得到利用。尚不确定特定的遗传改变是否会影响治疗反应。传统的统计学方法对于因果推断有局限性,并且很难在基因组数据集中获得足够的效力。在这里,我们开发并评估了一种 C-search 算法,用于搜索可最大程度提高治疗效果的因果基因。
该算法基于潜在结果框架和贝叶斯后验更新开发。使用模拟数据集验证了算法的精度。该算法被应用于 cBioPortal 数据集。通过三星医疗中心的 CancerSCAN 筛选数据,对算法发现的基因进行了外部验证。
模拟数据分析表明,C-search 算法能够从十个基因中识别出九个因果基因。C-search 算法的发现率迅速增加,直到达到 1500 个数据实例。与此同时,对数秩检验显示性能的增长较慢。C-search 算法能够从 cBioPortal Metabric 数据集中发现九个因果基因。在 cBioPortal 数据集和用于外部验证的 CancerSCAN 数据集中,对这些因果基因进行治疗与更好的生存结果相关。
与多次对数秩检验分析相比,我们的 C-search 算法在识别基因的因果效应方面表现出更好的性能,尤其是在数据数量有限的情况下。结果表明,C-search 可以从各种遗传数据集中发现因果基因,与变量数量相比,这些数据集的样本数量有限。