• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过神经符号学、知识增强学习对基因组变体进行优先级排序。

Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning.

机构信息

Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.

Computer Science Program, Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 4700 KAUST, Thuwal 23955, Saudi Arabia.

出版信息

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae301.

DOI:10.1093/bioinformatics/btae301
PMID:38696757
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11132820/
Abstract

MOTIVATION

Whole-exome and genome sequencing have become common tools in diagnosing patients with rare diseases. Despite their success, this approach leaves many patients undiagnosed. A common argument is that more disease variants still await discovery, or the novelty of disease phenotypes results from a combination of variants in multiple disease-related genes. Interpreting the phenotypic consequences of genomic variants relies on information about gene functions, gene expression, physiology, and other genomic features. Phenotype-based methods to identify variants involved in genetic diseases combine molecular features with prior knowledge about the phenotypic consequences of altering gene functions. While phenotype-based methods have been successfully applied to prioritizing variants, such methods are based on known gene-disease or gene-phenotype associations as training data and are applicable to genes that have phenotypes associated, thereby limiting their scope. In addition, phenotypes are not assigned uniformly by different clinicians, and phenotype-based methods need to account for this variability.

RESULTS

We developed an Embedding-based Phenotype Variant Predictor (EmbedPVP), a computational method to prioritize variants involved in genetic diseases by combining genomic information and clinical phenotypes. EmbedPVP leverages a large amount of background knowledge from human and model organisms about molecular mechanisms through which abnormal phenotypes may arise. Specifically, EmbedPVP incorporates phenotypes linked to genes, functions of gene products, and the anatomical site of gene expression, and systematically relates them to their phenotypic effects through neuro-symbolic, knowledge-enhanced machine learning. We demonstrate EmbedPVP's efficacy on a large set of synthetic genomes and genomes matched with clinical information.

AVAILABILITY AND IMPLEMENTATION

EmbedPVP and all evaluation experiments are freely available at https://github.com/bio-ontology-research-group/EmbedPVP.

摘要

动机

全外显子组和基因组测序已成为诊断罕见病患者的常用工具。尽管取得了成功,但这种方法仍有许多患者无法确诊。一个常见的观点是,仍有更多的疾病变异有待发现,或者疾病表型的新颖性是由多个疾病相关基因的变异组合而成的。解释基因组变异的表型后果依赖于关于基因功能、基因表达、生理学和其他基因组特征的信息。基于表型的方法用于识别与遗传疾病相关的变异,将分子特征与改变基因功能的表型后果的先验知识相结合。虽然基于表型的方法已成功应用于优先考虑变异,但这些方法基于已知的基因-疾病或基因-表型关联作为训练数据,并且适用于具有相关表型的基因,从而限制了其范围。此外,不同临床医生对表型的分配并不统一,基于表型的方法需要考虑这种可变性。

结果

我们开发了一种基于嵌入的表型变异预测器(EmbedPVP),这是一种通过结合基因组信息和临床表型来优先考虑遗传疾病相关变异的计算方法。EmbedPVP利用了大量关于分子机制的背景知识,这些机制可能导致异常表型的出现,包括人类和模型生物的知识。具体来说,EmbedPVP结合了与基因相关的表型、基因产物的功能以及基因表达的解剖部位,并通过神经符号、知识增强的机器学习系统地将它们与其表型效应联系起来。我们在一组大型合成基因组和与临床信息匹配的基因组上证明了 EmbedPVP 的功效。

可用性和实现

EmbedPVP 和所有评估实验均可在 https://github.com/bio-ontology-research-group/EmbedPVP 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36ba/11132820/277ca8019049/btae301f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36ba/11132820/277ca8019049/btae301f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36ba/11132820/277ca8019049/btae301f1.jpg

相似文献

1
Prioritizing genomic variants through neuro-symbolic, knowledge-enhanced learning.通过神经符号学、知识增强学习对基因组变体进行优先级排序。
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae301.
2
DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning.DeepSVP:利用深度学习进行基因型和表型整合的结构变异优先级排序。
Bioinformatics. 2022 Mar 4;38(6):1677-1684. doi: 10.1093/bioinformatics/btab859.
3
Predicting candidate genes from phenotypes, functions and anatomical site of expression.从表型、功能和表达的解剖部位预测候选基因。
Bioinformatics. 2021 May 5;37(6):853-860. doi: 10.1093/bioinformatics/btaa879.
4
OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants.寡聚聚乙烯吡咯烷酮:基于表型的个体基因组信息分析,以优先考虑寡基因疾病变异。
Sci Rep. 2018 Oct 2;8(1):14681. doi: 10.1038/s41598-018-32876-3.
5
DeepPVP: phenotype-based prioritization of causative variants using deep learning.DeepPVP:基于表型的深度学习因果变异优先级排序。
BMC Bioinformatics. 2019 Feb 6;20(1):65. doi: 10.1186/s12859-019-2633-8.
6
Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis.Xrare:一种联合建模表型和遗传证据的机器学习方法,用于罕见病诊断。
Genet Med. 2019 Sep;21(9):2126-2134. doi: 10.1038/s41436-019-0439-8. Epub 2019 Jan 24.
7
Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes.Starvar:一种基于症状的工具,用于使用文献和基因组中的证据自动对变体进行排名。
BMC Bioinformatics. 2023 Jul 21;24(1):294. doi: 10.1186/s12859-023-05406-w.
8
Mitochondrial Disease Sequence Data Resource (MSeqDR): a global grass-roots consortium to facilitate deposition, curation, annotation, and integrated analysis of genomic data for the mitochondrial disease clinical and research communities.线粒体疾病序列数据资源(MSeqDR):一个全球基层联盟,旨在促进为线粒体疾病临床和研究群体进行基因组数据的提交、管理、注释及综合分析。
Mol Genet Metab. 2015 Mar;114(3):388-96. doi: 10.1016/j.ymgme.2014.11.016. Epub 2014 Dec 4.
9
A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics.一种用于全基因组诊断中临床变异优先级排序和疾病基因发现的可视化与策展方法。
Genome Med. 2016 Feb 2;8(1):13. doi: 10.1186/s13073-016-0261-8.
10
parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.parSMURF,一种用于全基因组致病性变异检测的高性能计算工具。
Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa052.

引用本文的文献

1
GeOKG: geometry-aware knowledge graph embedding for Gene Ontology and genes.GeOKG:用于基因本体论和基因的几何感知知识图谱嵌入
Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf160.
2
The Unified Phenotype Ontology : a framework for cross-species integrative phenomics.统一表型本体论:跨物种综合表型组学的框架。
Genetics. 2025 Mar 17;229(3). doi: 10.1093/genetics/iyaf027.
3
The Unified Phenotype Ontology (uPheno): A framework for cross-species integrative phenomics.统一表型本体论(uPheno):跨物种综合表型组学的框架。

本文引用的文献

1
mOWL: Python library for machine learning with biomedical ontologies.mOWL:用于生物医学本体机器学习的 Python 库。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac811.
2
Contribution of model organism phenotypes to the computational identification of human disease genes.模式生物表型对计算鉴定人类疾病基因的贡献。
Dis Model Mech. 2022 Jul 1;15(7). doi: 10.1242/dmm.049441. Epub 2022 Aug 3.
3
The GA4GH Phenopacket schema defines a computable representation of clinical data.全球基因组与健康联盟(GA4GH)表型数据包模式定义了临床数据的可计算表示形式。
bioRxiv. 2024 Sep 22:2024.09.18.613276. doi: 10.1101/2024.09.18.613276.
4
Computational strategies for cross-species knowledge transfer and translational biomedicine.跨物种知识转移与转化医学的计算策略
ArXiv. 2024 Aug 16:arXiv:2408.08503v1.
Nat Biotechnol. 2022 Jun;40(6):817-820. doi: 10.1038/s41587-022-01357-4.
4
A framework to score the effects of structural variants in health and disease.一种用于评估结构变异对健康和疾病影响的框架。
Genome Res. 2022 Apr;32(4):766-777. doi: 10.1101/gr.275995.121. Epub 2022 Feb 23.
5
Bringing Light Into the Dark: A Large-Scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework.照亮黑暗:统一框架下知识图谱嵌入模型的大规模评估
IEEE Trans Pattern Anal Mach Intell. 2022 Dec;44(12):8825-8845. doi: 10.1109/TPAMI.2021.3124805. Epub 2022 Nov 7.
6
DeepPheno: Predicting single gene loss-of-function phenotypes using an ontology-aware hierarchical classifier.DeepPheno:使用本体感知层次分类器预测单基因功能丧失表型。
PLoS Comput Biol. 2020 Nov 18;16(11):e1008453. doi: 10.1371/journal.pcbi.1008453. eCollection 2020 Nov.
7
Predicting candidate genes from phenotypes, functions and anatomical site of expression.从表型、功能和表达的解剖部位预测候选基因。
Bioinformatics. 2021 May 5;37(6):853-860. doi: 10.1093/bioinformatics/btaa879.
8
Semantic similarity and machine learning with ontologies.语义相似性和本体论的机器学习。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa199.
9
A structural variation reference for medical and population genetics.医学和人群遗传学的结构变异参考
Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27.
10
ClinVar: improvements to accessing data.ClinVar:访问数据的改进。
Nucleic Acids Res. 2020 Jan 8;48(D1):D835-D844. doi: 10.1093/nar/gkz972.