Changsha KingMed Center for Clinical Laboratory, Changsha, China.
Guangzhou Kingmed Center for Clinical Laboratory, Guangzhou, China.
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac019.
It's challenging work to identify disease-causing genes from the next-generation sequencing (NGS) data of patients with Mendelian disorders. To improve this situation, researchers have developed many phenotype-driven gene prioritization methods using a patient's genotype and phenotype information, or phenotype information only as input to rank the candidate's pathogenic genes. Evaluations of these ranking methods provide practitioners with convenience for choosing an appropriate tool for their workflows, but retrospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate. In this research, the performance of ten recognized causal-gene prioritization methods was benchmarked using 305 cases from the Deciphering Developmental Disorders (DDD) project and 209 in-house cases via a relatively unbiased methodology. The evaluation results show that methods using Human Phenotype Ontology (HPO) terms and Variant Call Format (VCF) files as input achieved better overall performance than those using phenotypic data alone. Besides, LIRICAL and AMELIE, two of the best methods in our benchmark experiments, complement each other in cases with the causal genes ranked highly, suggesting a possible integrative approach to further enhance the diagnostic efficiency. Our benchmarking provides valuable reference information to the computer-assisted rapid diagnosis in Mendelian diseases and sheds some light on the potential direction of future improvement on disease-causing gene prioritization methods.
从孟德尔疾病患者的下一代测序(NGS)数据中识别致病基因是一项具有挑战性的工作。为了改善这种情况,研究人员开发了许多表型驱动的基因优先级排序方法,这些方法使用患者的基因型和表型信息,或仅表型信息作为输入,对候选致病基因进行排名。这些排序方法的评估为从业人员提供了便利,方便他们为工作流程选择合适的工具,但回顾性基准测试在试图区分时的功效不足,无法提供具有统计学意义的结果。在这项研究中,使用 Deciphering Developmental Disorders (DDD) 项目中的 305 个病例和 209 个内部病例,通过相对无偏的方法对十种公认的因果基因优先级排序方法的性能进行了基准测试。评估结果表明,使用人类表型本体(HPO)术语和变体调用格式(VCF)文件作为输入的方法的整体性能优于仅使用表型数据的方法。此外,在因果基因排名较高的情况下,我们的基准实验中表现最好的两种方法之一的 LIRICAL 和 AMELIE 相互补充,这表明可能有一种集成方法可以进一步提高诊断效率。我们的基准测试为孟德尔疾病的计算机辅助快速诊断提供了有价值的参考信息,并为致病基因优先级排序方法的未来改进方向提供了一些启示。