Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA.
Genet Med. 2021 Jun;23(6):1075-1085. doi: 10.1038/s41436-020-01084-8. Epub 2021 Feb 12.
Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful.
We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols.
We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases.
The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.
基因组测序已成为一种日益强大且相关的工具,可用于发现罕见的孟德尔疾病的遗传异常。尽管用于此任务的诊断工作流程中包含的计算工具在不断发展和改进,但我们仍试图研究测序处理工作流程中的共同点,以揭示共识和标准实践工具,并突出技术和理论方法改进最有影响力的探索性分析。
我们通过与生物信息学家的会议、在线调查表格以及对内部协议的分析,收集了美国参与未确诊疾病网络的一家遗传检测实验室和 11 个临床研究机构使用的计算方法的详细信息。
我们发现,基因组测序数据处理工具可以分为四个不同的类别。虽然初始变异调用和质量控制步骤已经存在成熟的实践,但在变异优先级和多模态数据集成的后期阶段,各个站点之间存在很大的差异,这表明解决最神秘的未确诊病例的方法多种多样。
诊断工作流程中最大的差异表明,结构性变异检测、非编码变异解释以及其他生物医学数据的整合方面的进展可能对解决长期未确诊的病例特别有希望。