Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
Nat Methods. 2023 Apr;20(4):559-568. doi: 10.1038/s41592-023-01799-x. Epub 2023 Mar 23.
Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.
结构变异(SV)是人类基因组中遗传多样性和疾病的主要驱动因素,其发现对于精准医学的发展至关重要。现有的 SV 调用器依赖于手工设计的特征和启发式方法来对 SV 进行建模,这既不能扩展到 SV 的巨大多样性,也不能充分利用测序数据集中的可用信息。在这里,我们提出了一个可扩展的深度学习框架 Cue,用于调用和基因分型 SV,可以直接从数据中学习复杂的 SV 抽象概念。在较高的层次上,Cue 将比对转换为图像,这些图像编码了与 SV 相关的信号,并使用堆叠沙漏卷积神经网络来预测每个图像中捕获的 SV 的类型、基因型和基因组位置。我们表明,Cue 在合成和真实短读数据上检测几种类型的 SV 的性能优于现有技术,并且可以轻松扩展到其他测序平台,同时实现有竞争力的性能。