Suppr超能文献

通过神经符号学、知识增强学习对基因组变体进行优先级排序。

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning.

机构信息

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.

Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.

出版信息

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae301.

Abstract

MOTIVATION

Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability.

RESULTS

We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information.

AVAILABILITY AND IMPLEMENTATION

EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.

摘要

动机

全外显子组和基因组测序已成为诊断罕见病患者的常用工具。尽管取得了成功,但这种方法仍有许多患者无法确诊。一个常见的观点是,仍有更多的疾病变异有待发现,或者疾病表型的新颖性是由多个疾病相关基因的变异组合而成的。解释基因组变异的表型后果依赖于关于基因功能、基因表达、生理学和其他基因组特征的信息。基于表型的方法用于识别与遗传疾病相关的变异,将分子特征与改变基因功能的表型后果的先验知识相结合。虽然基于表型的方法已成功应用于优先考虑变异,但这些方法基于已知的基因-疾病或基因-表型关联作为训练数据,并且适用于具有相关表型的基因,从而限制了其范围。此外,不同临床医生对表型的分配并不统一,基于表型的方法需要考虑这种可变性。

结果

我们开发了一种基于嵌入的表型变异预测器(EmbedPVP),这是一种通过结合基因组信息和临床表型来优先考虑遗传疾病相关变异的计算方法。EmbedPVP利用了大量关于分子机制的背景知识,这些机制可能导致异常表型的出现,包括人类和模型生物的知识。具体来说,EmbedPVP结合了与基因相关的表型、基因产物的功能以及基因表达的解剖部位,并通过神经符号、知识增强的机器学习系统地将它们与其表型效应联系起来。我们在一组大型合成基因组和与临床信息匹配的基因组上证明了 EmbedPVP 的功效。

可用性和实现

EmbedPVP 和所有评估实验均可在 https://github.com/bio-ontology-research-group/EmbedPVP 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36ba/11132820/277ca8019049/btae301f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验