Díaz-Navarro Ander, Bousquets-Muñoz Pablo, Nadeu Ferran, López-Tamargo Sara, Beà Silvia, Campo Elias, Puente Xose S
Departamento de Bioquímica y Biología Molecular, Instituto Universitario de Oncología (IUOPA), Universidad de Oviedo, 33006 Oviedo, Spain.
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), 08036 Barcelona, Spain.
NAR Genom Bioinform. 2023 May 30;5(2):lqad056. doi: 10.1093/nargab/lqad056. eCollection 2023 Jun.
The cost reduction in sequencing and the extensive genomic characterization of a wide variety of cancers are expanding tumor sequencing to a wide number of research groups and the clinical practice. Although specific pipelines have been generated for the identification of somatic mutations, their results usually differ considerably, and a common approach is to use several callers to achieve a more reliable set of mutations. This procedure is computationally expensive and time-consuming, and it suffers from the same limitations in sensitivity and specificity as other approaches. Expert revision of mutant calls is therefore required to verify calls that might be used for clinical diagnosis. This step could take advantage of machine learning techniques, as they provide a useful approach to incorporate expert-reviewed information for the identification of somatic mutations. Here we present RFcaller, a pipeline based on machine learning algorithms, for the detection of somatic mutations in tumor-normal paired samples that does not require large computing resources. RFcaller shows high accuracy for the detection of substitutions and insertions/deletions from whole genome or exome data. It allows the detection of mutations in driver genes missed by other approaches, and has been validated by comparison to deep and Sanger sequencing.
测序成本的降低以及对多种癌症进行广泛的基因组特征分析,正将肿瘤测序扩展到众多研究团队和临床实践中。尽管已经生成了用于识别体细胞突变的特定流程,但其结果通常差异很大,一种常见的方法是使用多种调用程序来获得一组更可靠的突变。这个过程在计算上既昂贵又耗时,并且与其他方法一样存在敏感性和特异性方面的局限性。因此,需要专家对突变调用进行修订,以验证可能用于临床诊断的调用。这一步骤可以利用机器学习技术,因为它们提供了一种有用的方法来整合经过专家审核的信息,用于识别体细胞突变。在这里,我们展示了RFcaller,这是一种基于机器学习算法的流程,用于检测肿瘤-正常配对样本中的体细胞突变,不需要大量计算资源。RFcaller在从全基因组或外显子组数据中检测替换和插入/缺失方面显示出高精度。它能够检测出其他方法遗漏的驱动基因突变,并且已经通过与深度测序和桑格测序的比较得到了验证。