Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA.
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06520, USA.
Genome Biol. 2020 Nov 9;21(1):274. doi: 10.1186/s13059-020-02178-x.
There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.
尽管致病性基因组结构变异(SV)在许多疾病中起着至关重要的作用,但目前缺乏识别它们的方法。我们提出了一种基于机器学习的、与机制无关的工作流程,称为 SVFX,用于为体细胞和种系 SV 分配致病性分数。具体来说,我们为患病和健康个体的 SV 调用集生成体细胞和种系训练模型,其中包括基于基因组、表观基因组和保守性的特征。然后,我们将 SVFX 应用于癌症和其他疾病中的 SV;SVFX 在识别致病性 SV 方面具有很高的准确性。在癌症队列中预测的致病性 SV 富集在已知的癌症基因和许多与癌症相关的途径中。