Yuebei People's Hospital, Shantou University Medical College, Shaoguan 512025, China.
Department of Gynecology, Yuebei People's Hospital, Shantou University Medical College, Shaoguan 512025, China.
Biomed Res Int. 2021 Apr 14;2021:6667201. doi: 10.1155/2021/6667201. eCollection 2021.
High-throughput sequencing is gaining popularity in clinical diagnoses, but more and more novel gene variants with unknown clinical significance are being found, giving difficulties to interpretations of people's genetic data, precise disease diagnoses, and the making of therapeutic strategies and decisions. In order to solve these issues, it is of critical importance to figure out ways to analyze and interpret such variants. In this work, BRCA1 gene variants with unknown clinical significance were identified from clinical sequencing data, and then, we developed machine learning models so as to predict the pathogenicity for variants with unknown clinical significance. Through performance benchmarking, we found that the optimized random forest model scored 0.85 in area under receiver operating characteristic curve, which outperformed other models. Finally, we applied the best random forest model to predict the pathogenicity of 6321 BRCA1 variants from both sequencing data and ClinVar database. As a result, we obtained the predictive pathogenic risks of BRCA1 variants of unknown significance.
高通量测序在临床诊断中越来越受欢迎,但越来越多具有未知临床意义的新基因变异被发现,这给人们的基因数据解读、精确疾病诊断以及治疗策略和决策的制定带来了困难。为了解决这些问题,找出分析和解释这些变异的方法至关重要。在这项工作中,我们从临床测序数据中鉴定了具有未知临床意义的 BRCA1 基因变异,然后开发了机器学习模型,以预测具有未知临床意义的变异的致病性。通过性能基准测试,我们发现优化后的随机森林模型在接收者操作特征曲线下的面积得分为 0.85,优于其他模型。最后,我们将最佳随机森林模型应用于从测序数据和 ClinVar 数据库中预测 6321 个 BRCA1 变异的致病性。结果,我们获得了未知意义的 BRCA1 变异的预测致病性风险。