Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark.
Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark.
Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbad519.
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
大量可用的测序数据使科学界能够探索可能驱动癌症或有利于癌症进展的不同遗传改变。软件开发人员提出了无数的预测工具,使研究人员和临床医生能够比较和优先考虑驱动基因和突变及其相对致病性。然而,对于计算方法或比较的黄金标准几乎没有共识。因此,不同工具的基准测试高度依赖于输入数据,表明过拟合仍然是一个大问题。解决方案之一是限制特定工具的范围和用途。然而,这种限制迫使研究人员在为特定目的创建和使用高质量工具与描述驱动癌症的复杂改变之间走钢丝。虽然癌症发展的知识每天都在增加,但许多生物信息学管道依赖于单核苷酸变异或在没有考虑细胞区室、突变负担或疾病进展的情况下的真空中的改变。即使在生物信息学和计算癌症生物学中,研究领域也各自为政,有可能忽略潜在的协同作用或突破。在这里,我们提供了用于构建或测试预测性癌症驱动工具的数据库和数据集概述。此外,我们还介绍了基于结构分析的用于驱动基因、驱动突变以及这些突变影响的预测工具。此外,我们建议并推荐了避免筒仓研究、走向整合框架的方向。