Suppr超能文献

NCBoost 通过在人类中对净化选择信号进行监督学习,对孟德尔疾病中的致病性非编码变体进行分类。

NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans.

机构信息

Clinical Bioinformatics Lab, Imagine Institute, Paris Descartes University, Sorbonne Paris Cité, 75015, Paris, France.

INSERM UMR 1163, Institut Imagine, 75015, Paris, France.

出版信息

Genome Biol. 2019 Feb 11;20(1):32. doi: 10.1186/s13059-019-1634-2.

Abstract

State-of-the-art methods assessing pathogenic non-coding variants have mostly been characterized on common disease-associated polymorphisms, yet with modest accuracy and strong positional biases. In this study, we curated 737 high-confidence pathogenic non-coding variants associated with monogenic Mendelian diseases. In addition to interspecies conservation, a comprehensive set of recent and ongoing purifying selection signals in humans is explored, accounting for lineage-specific regulatory elements. Supervised learning using gradient tree boosting on such features achieves a high predictive performance and overcomes positional bias. NCBoost performs consistently across diverse learning and independent testing data sets and outperforms other existing reference methods.

摘要

评估致病非编码变异的最新方法主要针对常见疾病相关的多态性进行了特征描述,但准确性和位置偏差都较大。在这项研究中,我们整理了 737 个与单基因孟德尔疾病相关的高可信度致病非编码变异。除了种间保守性,还探索了一套全面的近期和正在进行的人类净化选择信号,包括谱系特异性调控元件。在这些特征上使用梯度树增强进行监督学习可以实现较高的预测性能,并克服位置偏差。NCBoost 在不同的学习和独立测试数据集上表现一致,优于其他现有参考方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验