Suppr超能文献

利用 VarCoPP2.0 实现更快更准确的病原体组合预测。

Faster and more accurate pathogenic combination predictions with VarCoPP2.0.

机构信息

Machine Learning Group, Université Libre de Bruxelles, 1050, Brussels, Belgium.

Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050, Brussels, Belgium.

出版信息

BMC Bioinformatics. 2023 May 1;24(1):179. doi: 10.1186/s12859-023-05291-3.

Abstract

BACKGROUND

The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture.

RESULTS

We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels.

CONCLUSIONS

Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.

摘要

背景

在医学遗传学领域,预测潜在致病性的变异组合仍然是一项关键任务,有助于理解和检测寡基因/多基因疾病。针对此类病例定制的模型有助于缩短缺失诊断的差距,并帮助研究人员应对衍生数据的高复杂性。VarCoPP(Variant Combinations Pathogenicity Predictor)于 2019 年发布,它是第一个朝着这个方向迈出的重要步骤,可以识别基因对(双位点变异组合)中的潜在致病性变异组合。尽管它具有实用性和适用性,但仍存在一些问题,例如假阳性(FP)率、训练集的质量和复杂的架构,这限制了其更好的性能。

结果

我们提出了 VarCoPP2.0:这是 VarCoPP 的后继者,是一种简化、更快、更准确的预测模型,用于识别潜在致病性的双位点变异组合。交叉验证和独立数据集的结果表明,VarCoPP2.0 在敏感性(交叉验证时为 95%,测试时为 98%)和特异性(5%FP 率)方面都有所提高。同时,由于选择了更简单的平衡随机森林模型,其运行时间显著减少了 150 倍。其阳性训练集现在由基于 OLIDA(寡基因疾病数据库,https://olida.ibsquare.be)中存在的置信度得分,与致病性证据更有信心相关联的变异组合组成。其性能的提高还归因于通过原始包装方法更仔细地选择了最新的特征。我们表明,不同的变异和基因对特征的组合对于预测很重要,这突出了在不同层次上整合生物信息的有用性。

结论

通过提高性能和缩短执行时间,VarCoPP2.0 可以更准确地分析与寡基因疾病相关的更大数据集。用户可以访问 ORVAL 平台(https://orval.ibsquare.be)在他们的数据上应用 VarCoPP2.0。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cb9e/10152795/e0bcbe6e8a6e/12859_2023_5291_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验