Tuteja Sachleen, Kadri Sabah, Yap Kai Lee
Illinois Mathematics and Science Academy, 1500 Sullivan Road, Aurora, IL 60506, USA.
Department of Pathology and Laboratory Medicine, Ann and Robert H. Lurie Children's Hospital of Chicago, 225 E. Chicago Ave, Chicago, IL 60611, USA.
J Pathol Inform. 2022 Jul 28;13:100130. doi: 10.1016/j.jpi.2022.100130. eCollection 2022.
Dramatically expanding our ability for clinical genetic testing for inherited conditions and complex diseases such as cancer, next generation sequencing (NGS) technologies are allowing for rapid interrogation of thousands of genes and identification of millions of variants. Variant annotation, the process of assigning functional information to DNA variants based on the standardized Human Genome Variation Society (HGVS) nomenclature, is a fundamental challenge in the analysis of NGS data that has led to the development of many bioinformatic algorithms. In this study, we evaluated the performance of 3 variant annotation tools: Alamut® Batch, Ensembl Variant Effect Predictor (VEP), and ANNOVAR, benchmarked by a manually curated ground-truth set of 298 variants from the medical exome database at the Molecular Diagnostics Laboratory at Lurie Children's Hospital. Of the 3 tools, VEP produces the most accurate variant annotations (HGVS nomenclature for 297 of the 298 variants) due to usage of updated gene transcript versions within the algorithm. Alamut® Batch called 296 of the 298 variants correctly; strikingly, ANNOVAR exhibited the greatest number of discrepancies (20 of the 298 variants, 93.3% concordance with ground-truth set). Adoption of validated methods of variant annotation is critical in post-analytical phases of clinical testing.
新一代测序(NGS)技术极大地扩展了我们对遗传性疾病和复杂疾病(如癌症)进行临床基因检测的能力,使得能够快速检测数千个基因并识别数百万个变异。变异注释是根据标准化的人类基因组变异协会(HGVS)命名法为DNA变异赋予功能信息的过程,是NGS数据分析中的一项基本挑战,这促使了许多生物信息学算法的开发。在本研究中,我们评估了3种变异注释工具的性能:Alamut® Batch、Ensembl变异效应预测器(VEP)和ANNOVAR,以卢里儿童医院分子诊断实验室医学外显子数据库中精心挑选的298个变异的真实数据集为基准进行测试。在这3种工具中,由于算法中使用了更新的基因转录本版本,VEP产生的变异注释最为准确(298个变异中有297个符合HGVS命名法)。Alamut® Batch正确识别了298个变异中的296个;令人惊讶的是,ANNOVAR出现的差异最多(298个变异中有20个,与真实数据集的一致性为93.3%)。采用经过验证的变异注释方法在临床检测的分析后阶段至关重要。