基因组测序数据分析用于罕见病基因发现。

Genome sequencing data analysis for rare disease gene discovery.

机构信息

Division of Genomics & Translational Biomedicine, College of Health & Life Sciences, Hamad Bin Khalifa University, B-147, Penrose House, PO Box 34110, Education City, Doha, Qatar.

Quantitative Genomics Laboratories (qGenomics), Barcelona, Catalonia, Spain.

出版信息

Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab363.

DOI:10.1093/bib/bbab363

PMID:34498682

Abstract

Rare diseases occur in a smaller proportion of the general population, which is variedly defined as less than 200 000 individuals (US) or in less than 1 in 2000 individuals (Europe). Although rare, they collectively make up to approximately 7000 different disorders, with majority having a genetic origin, and affect roughly 300 million people globally. Most of the patients and their families undergo a long and frustrating diagnostic odyssey. However, advances in the field of genomics have started to facilitate the process of diagnosis, though it is hindered by the difficulty in genome data analysis and interpretation. A major impediment in diagnosis is in the understanding of the diverse approaches, tools and datasets available for variant prioritization, the most important step in the analysis of millions of variants to select a few potential variants. Here we present a review of the latest methodological developments and spectrum of tools available for rare disease genetic variant discovery and recommend appropriate data interpretation methods for variant prioritization. We have categorized the resources based on various steps of the variant interpretation workflow, starting from data processing, variant calling, annotation, filtration and finally prioritization, with a special emphasis on the last two steps. The methods discussed here pertain to elucidating the genetic basis of disease in individual patient cases via trio- or family-based analysis of the genome data. We advocate the use of a combination of tools and datasets and to follow multiple iterative approaches to elucidate the potential causative variant.

摘要

罕见病在总人口中的比例较小，其定义各不相同，在美国为少于 20 万人，在欧洲为少于每 2000 人中有 1 人。尽管罕见，但它们共同构成了大约 7000 种不同的疾病，其中大多数具有遗传起源，影响着全球约 3 亿人。大多数患者及其家属都经历了漫长而令人沮丧的诊断之旅。然而，基因组学领域的进步已经开始促进诊断过程，尽管由于基因组数据分析和解释的困难而受阻。诊断的一个主要障碍是理解用于变异优先级排序的各种方法、工具和数据集，这是分析数百万个变体以选择少数潜在变体的分析过程中最重要的一步。在这里，我们回顾了用于罕见病遗传变异发现的最新方法学进展和工具，并为变异优先级排序推荐了适当的数据解释方法。我们根据变异解释工作流程的各个步骤对资源进行了分类，从数据处理、变异调用、注释、过滤到最后是优先级排序，特别强调了最后两个步骤。这里讨论的方法涉及通过对基因组数据进行三或家系分析来阐明个体患者病例的疾病遗传基础。我们提倡使用工具和数据集的组合，并遵循多种迭代方法来阐明潜在的致病变体。