TRON-Translational Oncology at the University Medical Center of Johannes Gutenberg University Mainz gGmbH, Mainz, Germany.
University Medical Center of the Johannes Gutenberg University, Mainz, Germany.
PLoS Comput Biol. 2020 Nov 23;16(11):e1008397. doi: 10.1371/journal.pcbi.1008397. eCollection 2020 Nov.
Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, we performed a comprehensive analysis of different types and sizes of SVs predicted by both the technologies and validated with an independent PCR based approach. The SVs commonly identified by both the technologies were highly specific, while validation rate dropped for uncommon events. A particularly high FDR was observed for SVs only found by 10XWGS. To improve FDR and sensitivity, statistical models for both the technologies were trained. Using our approach, we characterized SVs from the MCF7 cell line and a primary breast cancer tumor with high precision. This approach improves SV prediction and can therefore help in understanding the underlying genetics in various diseases.
遗传性疾病是由人类基因组的异常引起的。鉴定这些异常,包括结构变异(SVs),是我们理解的关键。传统的短读长全基因组测序(cWGS)可以识别到碱基对分辨率的 SVs,但仅利用短程信息,且具有较高的假阳性率(FDR)。连接读取测序(10XWGS)通过源自同一大 DNA 分子的短读长的连接利用长程信息。这可以减轻基于比对的伪影,特别是在重复区域,并应能够更好地预测 SVs。然而,这种技术的无偏评估尚不可用。在这项研究中,我们对两种技术预测的不同类型和大小的 SVs 进行了全面分析,并通过独立的基于 PCR 的方法进行了验证。两种技术共同识别的 SVs 具有高度特异性,而罕见事件的验证率下降。仅通过 10XWGS 发现的 SVs 观察到特别高的 FDR。为了提高 FDR 和灵敏度,为两种技术都训练了统计模型。使用我们的方法,我们从 MCF7 细胞系和原发性乳腺癌肿瘤中高精度地描述了 SVs。这种方法提高了 SV 预测的准确性,因此有助于理解各种疾病中的潜在遗传学。