Mantis-ml：基于随机半监督学习的高通量基因组筛选中的疾病非特异性基因优先级排序。

Mantis-ml: Disease-Agnostic Gene Prioritization from High-Throughput Genomic Screens by Stochastic Semi-supervised Learning.

机构信息

Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, 1 Francis Crick Avenue, CB2 0RE Cambridge, UK.

出版信息

Am J Hum Genet. 2020 May 7;106(5):659-678. doi: 10.1016/j.ajhg.2020.03.012.

DOI:10.1016/j.ajhg.2020.03.012

PMID:32386536

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7212270/

Abstract

Access to large-scale genomics datasets has increased the utility of hypothesis-free genome-wide analyses. However, gene signals are often insufficiently powered to reach experiment-wide significance, triggering a process of laborious triaging of genomic-association-study results. We introduce mantis-ml, a multi-dimensional, multi-step machine-learning framework that allows objective assessment of the biological relevance of genes to disease studies. Mantis-ml is an automated machine-learning framework that follows a multi-model approach of stochastic semi-supervised learning to rank disease-associated genes through iterative learning sessions on random balanced datasets across the protein-coding exome. When applied to a range of human diseases, including chronic kidney disease (CKD), epilepsy, and amyotrophic lateral sclerosis (ALS), mantis-ml achieved an average area under curve (AUC) prediction performance of 0.81-0.89. Critically, to prove its value as a tool that can be used to interpret exome-wide association studies, we overlapped mantis-ml predictions with data from published cohort-level association studies. We found a statistically significant enrichment of high mantis-ml predictions among the highest-ranked genes from hypothesis-free cohort-level statistics, indicating a substantial improvement over the performance of current state-of-the-art methods and pointing to the capture of true prioritization signals for disease-associated genes. Finally, we introduce a generic mantis-ml score (GMS) trained with over 1,200 features as a generic-disease-likelihood estimator, outperforming published gene-level scores. In addition to our tool, we provide a gene prioritization atlas that includes mantis-ml's predictions across ten disease areas and empowers researchers to interactively navigate through the gene-triaging framework. Mantis-ml is an intuitive tool that supports the objective triaging of large-scale genomic discovery studies and enhances our understanding of complex genotype-phenotype associations.

摘要

大规模基因组数据集的获取增加了无假设全基因组分析的实用性。然而，基因信号通常不足以达到全实验范围的显著性，从而引发了对基因组关联研究结果进行繁琐分类的过程。我们引入了 mantis-ml，这是一个多维、多步骤的机器学习框架，允许客观评估基因与疾病研究的生物学相关性。mantis-ml 是一个自动化的机器学习框架，它采用随机半监督学习的多模型方法，通过在蛋白质编码外显子的随机平衡数据集中进行迭代学习会议，对疾病相关基因进行排名。当应用于一系列人类疾病，包括慢性肾脏病 (CKD)、癫痫和肌萎缩性侧索硬化症 (ALS) 时，mantis-ml 实现了 0.81-0.89 的平均曲线下面积 (AUC) 预测性能。至关重要的是，为了证明它作为一种可用于解释外显子全关联研究的工具的价值，我们将 mantis-ml 的预测与已发表的队列水平关联研究的数据重叠。我们发现，在无假设的队列水平统计中排名最高的基因中，高 mantis-ml 预测的显著富集，这表明它的性能明显优于当前最先进的方法，并指出了对疾病相关基因的真正优先级信号的捕获。最后，我们引入了一个基于超过 1200 个特征训练的通用 mantis-ml 分数 (GMS)，作为通用疾病可能性估计器，其性能优于已发表的基因分数。除了我们的工具，我们还提供了一个基因优先级图谱，其中包括 mantis-ml 在十个疾病领域的预测，使研究人员能够交互式地浏览基因分类框架。mantis-ml 是一个直观的工具，支持大规模基因组发现研究的客观分类，并增强了我们对复杂基因型-表型关联的理解。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

Mantis-ml：基于随机半监督学习的高通量基因组筛选中的疾病非特异性基因优先级排序。

Mantis-ml: Disease-Agnostic Gene Prioritization from High-Throughput Genomic Screens by Stochastic Semi-supervised Learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

Mantis-ml：基于随机半监督学习的高通量基因组筛选中的疾病非特异性基因优先级排序。

Mantis-ml: Disease-Agnostic Gene Prioritization from High-Throughput Genomic Screens by Stochastic Semi-supervised Learning.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献