• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于蛋白质-蛋白质相互作用关键词网络的元图表示法预测潜在的精神分裂症相关基因:框架开发与验证

Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation.

作者信息

Yu Shirui, Wang Ziyang, Nan Jiale, Li Aihua, Yang Xuemei, Tang Xiaoli

机构信息

Institute of Medical Information, Chinese Academy of Medical Sciences, Beijing, China.

出版信息

JMIR Form Res. 2023 Nov 15;7:e50998. doi: 10.2196/50998.

DOI:10.2196/50998
PMID:37966892
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10687686/
Abstract

BACKGROUND

Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets.

OBJECTIVE

The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease-causing property. Consequently, a PPIK-based metagraph representation approach is proposed.

METHODS

To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models.

RESULTS

Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases.

CONCLUSIONS

The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/5348343e53b1/formative_v7i1e50998_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/e0845eeb3406/formative_v7i1e50998_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/bdd26977f262/formative_v7i1e50998_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/695db48bebbf/formative_v7i1e50998_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/2bc32d87970c/formative_v7i1e50998_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/95056a9a981a/formative_v7i1e50998_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/5348343e53b1/formative_v7i1e50998_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/e0845eeb3406/formative_v7i1e50998_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/bdd26977f262/formative_v7i1e50998_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/695db48bebbf/formative_v7i1e50998_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/2bc32d87970c/formative_v7i1e50998_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/95056a9a981a/formative_v7i1e50998_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7cbb/10687686/5348343e53b1/formative_v7i1e50998_fig6.jpg
摘要

背景

精神分裂症是一种严重的精神疾病。随着对该疾病研究资金的增加,精神分裂症已成为医学领域的关键关注领域之一。寻找疾病与基因之间的关联是研究复杂疾病的有效方法,这可能会加强对精神分裂症病理学的研究,并有助于确定新的治疗靶点。

目的

本研究旨在通过运用机器学习方法提取蛋白质的拓扑特征及其在蛋白质-蛋白质相互作用(PPI)-关键词(PPIK)网络中的功能作用,识别潜在的精神分裂症风险基因,并了解复杂的致病特性。因此,提出了一种基于PPIK的元图表示方法。

方法

为了丰富PPI网络,我们整合了描述蛋白质特性的关键词并构建了PPIK网络。我们通过元图提取了描述该网络拓扑结构的特征。我们进一步将这些元图转换为向量,并用一系列向量表示蛋白质。然后,我们使用随机森林(RF)、极端梯度提升、轻梯度提升机和逻辑回归模型对模型进行训练和优化。

结果

综合实验表明,我们提出的方法具有良好的性能,受试者工作特征曲线(AUC)值在0.72至0.76之间。我们的模型在整体疾病蛋白质预测方面也优于基线方法,包括重启随机游走、平均通勤时间和Katz模型。与从基线模型构建的PPI网络相比,PPIK网络中关键词的补充平均将性能(AUC)提高了0.08,基于元图的方法与基线方法相比平均将AUC提高了0.30。根据四个模型的综合性能,RF被选为疾病蛋白质预测的最佳模型,其精确率、召回率、F1分数和AUC值分别为0.76、0.73、0.72和0.76。我们将这些蛋白质转换为它们的编码基因ID,并将前20个基因确定为最有可能的精神分裂症风险基因,包括EYA3、CNTN4、HSPA8、LRRK2和AFP基因。我们进一步根据元图特征和文献证据验证了这些结果,进行了特征分析,并利用文献证据解释了预测基因与疾病之间的相关性。

结论

基于PPIK网络框架的元图表示被发现对识别潜在的精神分裂症风险基因有效。结果相当可靠,因为文献中可以找到支持我们预测的证据。我们的方法可以为精神分裂症的发病机制提供更多生物学见解。

相似文献

1
Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation.基于蛋白质-蛋白质相互作用关键词网络的元图表示法预测潜在的精神分裂症相关基因:框架开发与验证
JMIR Form Res. 2023 Nov 15;7:e50998. doi: 10.2196/50998.
2
Disease Gene Classification with Metagraph Representations.基于元图表示的疾病基因分类
Methods Mol Biol. 2018;1807:211-224. doi: 10.1007/978-1-4939-8561-6_16.
3
Disease gene classification with metagraph representations.基于超图表示的疾病基因分类。
Methods. 2017 Dec 1;131:83-92. doi: 10.1016/j.ymeth.2017.06.036. Epub 2017 Jul 8.
4
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
5
[Construction of a predictive model for in-hospital mortality of sepsis patients in intensive care unit based on machine learning].基于机器学习构建重症监护病房脓毒症患者院内死亡率预测模型
Zhonghua Wei Zhong Bing Ji Jiu Yi Xue. 2023 Jul;35(7):696-701. doi: 10.3760/cma.j.cn121430-20221219-01104.
6
Non-Contrasted CT Radiomics for SAH Prognosis Prediction.用于蛛网膜下腔出血预后预测的非增强CT影像组学
Bioengineering (Basel). 2023 Aug 16;10(8):967. doi: 10.3390/bioengineering10080967.
7
Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.机器学习模型在预测髋部骨折手术后输血可能性中的应用。
Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.
8
A Risk Prediction Model for Physical Restraints Among Older Chinese Adults in Long-term Care Facilities: Machine Learning Study.长期护理机构中老年人身体约束的风险预测模型:机器学习研究。
J Med Internet Res. 2023 Apr 6;25:e43815. doi: 10.2196/43815.
9
Efficient link prediction in the protein-protein interaction network using topological information in a generative adversarial network machine learning model.利用生成对抗网络机器学习模型中的拓扑信息提高蛋白质 - 蛋白质相互作用网络中的链路预测效率。
BMC Bioinformatics. 2022 Feb 19;23(1):78. doi: 10.1186/s12859-022-04598-x.
10
Prediction Models for AKI in ICU: A Comparative Study.重症监护病房中急性肾损伤的预测模型:一项比较研究。
Int J Gen Med. 2021 Feb 25;14:623-632. doi: 10.2147/IJGM.S289671. eCollection 2021.

引用本文的文献

1
Hydrogen Sulfide Alleviates Schizophrenia-Like Behavior Through Regulating Apoptosis by S-Sulfhydrylation Modification.硫化氢通过S-硫巯基化修饰调节细胞凋亡减轻类精神分裂症行为
CNS Neurosci Ther. 2025 Feb;31(2):e70278. doi: 10.1111/cns.70278.

本文引用的文献

1
Upregulation of M6A Reader HNRNPA2B1 Associated with Poor Prognosis and Tumor Progression in Lung Adenocarcinoma.m6A阅读蛋白HNRNPA2B1的上调与肺腺癌的不良预后和肿瘤进展相关。
Recent Pat Anticancer Drug Discov. 2024;19(5):652-665. doi: 10.2174/0115748928258696230925064550.
2
Identification of potential biomarkers and infiltrating immune cells from scalp psoriasis.从头皮银屑病中鉴定潜在的生物标志物和浸润免疫细胞。
Gene. 2024 Jan 30;893:147918. doi: 10.1016/j.gene.2023.147918. Epub 2023 Oct 21.
3
Connecting Neurobiological Features with Interregional Dysconnectivity in Social-Cognitive Impairments of Schizophrenia.
将神经生物学特征与精神分裂症社会认知障碍的区域性连接失调联系起来。
Int J Mol Sci. 2023 Apr 22;24(9):7680. doi: 10.3390/ijms24097680.
4
[Research progress on the immunomodulatory effects and mechanisms of trace amine-associated receptor 1].[痕量胺相关受体1的免疫调节作用及机制研究进展]
Sheng Li Xue Bao. 2023 Apr 25;75(2):248-254.
5
Polish Psychiatric Association diagnostic and therapeutic management guidelines for patients with early-onset schizophrenia.波兰精神病学协会早发性精神分裂症患者诊断与治疗管理指南。
Psychiatr Pol. 2022 Aug 31;56(4):675-695. doi: 10.12740/PP/OnlineFirst/149707.
6
MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction.MGREL:一种基于多图表示学习的集成学习方法,用于基因-疾病关联预测。
Comput Biol Med. 2023 Mar;155:106642. doi: 10.1016/j.compbiomed.2023.106642. Epub 2023 Feb 10.
7
REDDA: Integrating multiple biological relations to heterogeneous graph neural network for drug-disease association prediction.REDDA:将多种生物关系整合到异构图神经网络中用于药物-疾病关联预测。
Comput Biol Med. 2022 Nov;150:106127. doi: 10.1016/j.compbiomed.2022.106127. Epub 2022 Sep 22.
8
Prediction of protein-protein interaction using graph neural networks.基于图神经网络的蛋白质-蛋白质相互作用预测。
Sci Rep. 2022 May 19;12(1):8360. doi: 10.1038/s41598-022-12201-9.
9
Rare coding variants in ten genes confer substantial risk for schizophrenia.十个基因中的罕见编码变异赋予精神分裂症的显著风险。
Nature. 2022 Apr;604(7906):509-516. doi: 10.1038/s41586-022-04556-w. Epub 2022 Apr 8.
10
UniRule: a unified rule resource for automatic annotation in the UniProt Knowledgebase.统一规则:用于UniProt知识库中自动注释的统一规则资源。
Bioinformatics. 2021 Apr 1;36(22-23):5562. doi: 10.1093/bioinformatics/btaa663.