State Key Laboratory of Cancer Biology and Department of Physiology and Pathophysiology, Fourth Military Medical University, Xi'an, China.
State Key Laboratory of Cancer Biology, Department of Pathology, Xijing Hospital and School of Basic Medicine, Fourth Military Medical University, Xi'an, China.
Mol Oncol. 2023 May;17(5):857-871. doi: 10.1002/1878-0261.13335. Epub 2022 Dec 15.
Mitochondrial DNA (mtDNA) somatic mutations play important roles in the initiation and progression of cancer. Although next-generation sequencing (NGS) of paired tumor and control samples has become a common practice to identify tumor-specific mtDNA mutations, the unique nature of mtDNA and NGS-associated sequencing bias could cause false-positive/-negative somatic mutation calling. Additionally, there are clinical scenarios where matched control tissues are unavailable for comparison. Therefore, a novel approach for accurately identifying somatic mtDNA variants is greatly needed, particularly in the absence of matched controls. In this study, the ground truth mtDNA variants orthogonally validated by triple-paired tumor, adjacent nontumor, and blood samples were used to develop mitoSomatic, a random forest-based machine learning tool. We demonstrated that mitoSomatic achieved area under the curve (AUC) values over 0.99 for identifying somatic mtDNA variants without paired control in three tumor types. In addition, mitoSomatic was also applicable in nontumor tissues such as adjacent nontumor and blood samples, suggesting the flexibility of mitoSomatic's classification capability. Furthermore, analysis of triple-paired samples identified a small group of variants with uncertain somatic/germline origin, whereas application of mitoSomatic significantly facilitated the prediction of their possible source. Finally, a control-free evaluation of the public pan-cancer NGS dataset with mitoSomatic revealed a substantial number of variants that were probably misclassified by conventional tumor-control comparison, further emphasizing the usefulness of mitoSomatic in application. Taken together, our study demonstrates that mitoSomatic is valuable for accurately identifying somatic mtDNA variants in mtDNA NGS data without paired controls, applicable for both tumor and nontumor tissues.
线粒体 DNA(mtDNA)体细胞突变在癌症的发生和发展中起着重要作用。虽然对配对的肿瘤和对照样本进行下一代测序(NGS)已成为鉴定肿瘤特异性 mtDNA 突变的常见方法,但 mtDNA 的独特性质和 NGS 相关的测序偏差可能导致假阳性/假阴性体细胞突变调用。此外,在某些临床情况下,无法获得配对的对照组织进行比较。因此,非常需要一种准确识别体细胞 mtDNA 变体的新方法,特别是在没有配对对照的情况下。在这项研究中,使用经过三重配对的肿瘤、相邻非肿瘤和血液样本正交验证的真实 mtDNA 变体来开发基于随机森林的机器学习工具 mitoSomatic。我们证明,在三种肿瘤类型中,mLoSomatic 在没有配对对照的情况下识别体细胞 mtDNA 变体的曲线下面积(AUC)值超过 0.99。此外,mLoSomatic 还可应用于非肿瘤组织,如相邻非肿瘤和血液样本,表明其分类能力具有灵活性。此外,对三重配对样本的分析确定了一小部分来源不确定的体细胞/种系变体,而 mitoSomatic 的应用显著有助于预测它们的可能来源。最后,使用 mitoSomatic 对公共泛癌 NGS 数据集进行无对照评估,揭示了大量可能被传统肿瘤对照比较错误分类的变体,进一步强调了 mitoSomatic 在应用中的有用性。总之,我们的研究表明,mLoSomatic 可用于在没有配对对照的情况下准确识别 mtDNA NGS 数据中的体细胞 mtDNA 变体,适用于肿瘤和非肿瘤组织。