Kundu Kunal, Pal Lipika R, Yin Yizhou, Moult John
Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland.
Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, Maryland.
Hum Mutat. 2017 Sep;38(9):1201-1216. doi: 10.1002/humu.23249. Epub 2017 Jun 27.
The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area.
基因组合测序用于诊断和预后检测目前已广泛应用,但迄今为止,用于解释这些数据的方法几乎没有客观的测试。我们描述了一种基因组合测序数据分析流程(VarP)的设计与实施及其在CAGI4社区实验中的评估。该方法应用于106例患者的临床基因组合测序数据,目的是确定每位患者所属的14种疾病类别中的哪一种以及相应的致病变异。在36例病例中正确识别出了疾病类别,其中包括10例原始临床流程未发现致病变异的病例。另外7例病例中,我们发现了与所检测疾病不同的另一种疾病的有力证据。许多潜在的致病变异是错义变异,之前未与疾病相关联,而这些变异最难正确判定其致病性或其他情况。分析后表明,三维结构数据对多达一半的此类病例可能会有帮助。过度依赖HGMD注释导致了一些疾病分类错误。我们使用了一种很大程度上临时的方法来为每个变异分配致病性概率,在这一领域仍有许多工作要做。