Gomes Daniel Henrique Ferreira, Medeiros Inácio Gomes, Petta Tirzah Braz, Stransky Beatriz, de Souza Jorge Estefano Santana
Bioinformatics Postgraduate Program, Metrópole Digital Institute, Federal University of Rio Grande Do Norte, Natal, Rio Grande Do Norte, 59078-400, Brazil.
Bioinformatics Multidisciplinary Environment (BioME), Metrópole Digital Institute, Federal University of Rio Grande Do Norte, Natal, Rio Grande Do Norte, 59078-400, Brazil.
BMC Bioinformatics. 2025 Apr 9;26(1):101. doi: 10.1186/s12859-025-06113-4.
A significant challenge in precision medicine is confidently identifying mutations detected in sequencing processes that play roles in disease treatment or diagnosis. Furthermore, the lack of representativeness of single nucleotide variants in public databases and low sequencing rates in underrepresented populations pose defies, with many pathogenic mutations still awaiting discovery. Mutational pathogenicity predictors have gained relevance as supportive tools in medical decision-making. However, significant disagreement among different tools regarding pathogenicity identification is rooted, necessitating manual verification to confirm mutation effects accurately.
This article presents a cross-platform mobile application, DTreePred, an online visualization tool for assessing the pathogenicity of nucleotide variants. DTreePred utilizes a machine learning-based pathogenicity model, including a decision tree algorithm and 15 machine learning classifiers alongside classical predictors. Connecting public databases with diverse prediction algorithms streamlines variant analysis, whereas the decision tree algorithm enhances the accuracy and reliability of variant pathogenicity data. This integration of information from various sources and prediction techniques aims to serve as a functional guide for decision-making in clinical practice. In addition, we tested DTreePred in a case study involving a cohort from Rio Grande do Norte, Brazil. By categorizing nucleotide variants from the list of oncogenes and suppressor genes classified in ClinVar as inexact data, DTreePred successfully revealed the pathogenicity of more than 95% of the nucleotide variants. Furthermore, an integrity test with 200 known mutations yielded an accuracy of 97%, surpassing rates expected from previous models.
DTreePred offers a robust solution for reducing uncertainty in clinical decision-making regarding pathogenic variants. Improving the accuracy of pathogenicity assessments has the potential to significantly increase the precision of medical diagnoses and treatments, particularly for underrepresented populations.
精准医学面临的一项重大挑战是,要确定在测序过程中检测到的、对疾病治疗或诊断有作用的突变。此外,公共数据库中单核苷酸变异缺乏代表性,以及代表性不足人群的测序率较低,这带来了挑战,许多致病突变仍有待发现。突变致病性预测工具已成为医学决策中的辅助工具。然而,不同工具在致病性鉴定方面存在重大分歧,这就需要人工验证以准确确认突变效应。
本文介绍了一款跨平台移动应用程序DTreePred,这是一种用于评估核苷酸变异致病性的在线可视化工具。DTreePred利用基于机器学习的致病性模型,包括决策树算法和15个机器学习分类器以及经典预测器。将公共数据库与多种预测算法相连接,简化了变异分析,而决策树算法提高了变异致病性数据的准确性和可靠性。这种来自各种来源和预测技术的信息整合旨在为临床实践中的决策提供实用指南。此外,我们在一项涉及巴西北里奥格兰德一群人的案例研究中测试了DTreePred。通过将ClinVar中分类的癌基因和抑癌基因列表中的核苷酸变异归类为不精确数据,DTreePred成功揭示了超过95%的核苷酸变异的致病性。此外,对200个已知突变进行的完整性测试得出的准确率为97%,超过了先前模型预期的准确率。
DTreePred为减少临床决策中关于致病变异的不确定性提供了一个强大的解决方案。提高致病性评估的准确性有可能显著提高医学诊断和治疗的精准度,特别是对于代表性不足的人群。