• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于异构信息融合的致病基因预测算法

Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion.

作者信息

Wang Chunyu, Zhang Jie, Wang Xueping, Han Ke, Guo Maozu

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.

School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China.

出版信息

Front Genet. 2020 Feb 4;11:5. doi: 10.3389/fgene.2020.00005. eCollection 2020.

DOI:10.3389/fgene.2020.00005
PMID:32117433
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7010852/
Abstract

Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.

摘要

复杂疾病严重影响人们的身心健康。致病基因的发现已成为研究目标。随着生物信息学的出现和生物技术的快速发展,为克服传统生物医学方法实验周期长、成本高的固有困难,研究人员提出了许多基因优先级排序算法,这些算法利用大量生物数据挖掘致病基因。然而,由于目前已知的基因-疾病关联矩阵仍然非常稀疏,且缺乏基因与疾病不相关的证据,基因优先级排序算法的预测性能存在局限性。基于功能相关基因突变可能导致相似疾病表型的假设,本文提出了一种基于异构信息融合的PU诱导矩阵补全算法(PUIMCHIF),用于预测参与人类疾病致病性的候选基因。一方面,PUIMCHIF使用不同的紧凑特征学习方法从多个数据源提取基因和疾病的特征,弥补稀疏数据的不足。另一方面,基于大多数未知基因-疾病关联不相关的先验知识,我们使用PU学习策略将未知的未标记数据作为负例进行有偏学习。PUIMCHIF算法在精确率、召回率和平均百分位排名(MPR)这三个指标上的实验结果明显优于其他算法。在多个基因和多种疾病的前100名全局预测分析中,使用PUIMCHIF恢复真实基因关联的概率达到50%,MPR值为10.94%。PUIMCHIF算法比其他方法(如IMC和CATAPULT)具有更高的优先级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/79a72b1adb5e/fgene-11-00005-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/05376af8469f/fgene-11-00005-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/43f36c1f8559/fgene-11-00005-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/8b82bb8e304a/fgene-11-00005-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/c8f6e99ad368/fgene-11-00005-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/36f7bc205311/fgene-11-00005-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/face9504105b/fgene-11-00005-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/04f650e9ec3d/fgene-11-00005-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/79a72b1adb5e/fgene-11-00005-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/05376af8469f/fgene-11-00005-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/43f36c1f8559/fgene-11-00005-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/8b82bb8e304a/fgene-11-00005-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/c8f6e99ad368/fgene-11-00005-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/36f7bc205311/fgene-11-00005-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/face9504105b/fgene-11-00005-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/04f650e9ec3d/fgene-11-00005-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed84/7010852/79a72b1adb5e/fgene-11-00005-g008.jpg

相似文献

1
Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion.基于异构信息融合的致病基因预测算法
Front Genet. 2020 Feb 4;11:5. doi: 10.3389/fgene.2020.00005. eCollection 2020.
2
Deep Collaborative Filtering for Prediction of Disease Genes.深度协同过滤在疾病基因预测中的应用。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1639-1647. doi: 10.1109/TCBB.2019.2907536. Epub 2019 Mar 26.
3
TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain.TLGP:一种基于异质源域的基因优先级排序的灵活迁移学习算法。
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):274. doi: 10.1186/s12859-021-04190-9.
4
Ensemble positive unlabeled learning for disease gene identification.用于疾病基因识别的集成正无标记学习
PLoS One. 2014 May 9;9(5):e97079. doi: 10.1371/journal.pone.0097079. eCollection 2014.
5
C-PUGP: A cluster-based positive unlabeled learning method for disease gene prediction and prioritization.C-PUGP:一种基于聚类的阳性无标签学习方法,用于疾病基因预测和优先级排序。
Comput Biol Chem. 2018 Oct;76:23-31. doi: 10.1016/j.compbiolchem.2018.05.022. Epub 2018 Jun 1.
6
Positive-unlabeled learning for disease gene identification.基于正例无标记学习的疾病基因识别。
Bioinformatics. 2012 Oct 15;28(20):2640-7. doi: 10.1093/bioinformatics/bts504. Epub 2012 Aug 24.
7
Network-based ranking methods for prediction of novel disease associated microRNAs.基于网络的新型疾病相关微小RNA预测排序方法。
Comput Biol Chem. 2015 Oct;58:139-48. doi: 10.1016/j.compbiolchem.2015.07.003. Epub 2015 Jul 21.
8
Predicting disease genes based on multi-head attention fusion.基于多头注意力融合的疾病基因预测。
BMC Bioinformatics. 2023 Apr 21;24(1):162. doi: 10.1186/s12859-023-05285-1.
9
A meta-learning framework using representation learning to predict drug-drug interaction.基于表示学习的药物-药物相互作用预测元学习框架
J Biomed Inform. 2018 Aug;84:136-147. doi: 10.1016/j.jbi.2018.06.015. Epub 2018 Jun 26.
10
Comparative analysis of supervised learning algorithms for prediction of cardiovascular diseases.监督学习算法在心血管疾病预测中的比较分析。
Technol Health Care. 2024;32(S1):241-251. doi: 10.3233/THC-248021.

引用本文的文献

1
Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors.基于分子片段描述符预测致病性单氨基酸替换。
Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad484.
2
Primary immunodeficiency-related genes in neonatal intensive care unit patients with various genetic immune abnormalities: a multicentre study in China.新生儿重症监护病房中患有各种遗传性免疫异常患者的原发性免疫缺陷相关基因:一项中国多中心研究
Clin Transl Immunology. 2021 Mar 22;10(3):e1266. doi: 10.1002/cti2.1266. eCollection 2021.

本文引用的文献

1
gutMDisorder: a comprehensive database for dysbiosis of the gut microbiota in disorders and interventions.肠道疾病:一个关于疾病和干预中肠道微生物群失调的综合数据库。
Nucleic Acids Res. 2020 Jul 27;48(13):7603. doi: 10.1093/nar/gkaa511.
2
Fold-LTR-TCP: protein fold recognition based on triadic closure principle.Fold-LTR-TCP:基于三元闭合原理的蛋白质折叠识别。
Brief Bioinform. 2020 Dec 1;21(6):2185-2193. doi: 10.1093/bib/bbz139.
3
LDAH2V: Exploring Meta-Paths Across Multiple Networks for lncRNA-Disease Association Prediction.
LDAH2V:跨多个网络探索元路径用于长链非编码RNA-疾病关联预测
IEEE/ACM Trans Comput Biol Bioinform. 2021 Jul-Aug;18(4):1572-1581. doi: 10.1109/TCBB.2019.2946257. Epub 2021 Aug 6.
4
DrugCombDB: a comprehensive database of drug combinations toward the discovery of combinatorial therapy.DrugCombDB:一个综合性的药物组合数据库,旨在发现组合疗法。
Nucleic Acids Res. 2020 Jan 8;48(D1):D871-D881. doi: 10.1093/nar/gkz1007.
5
DeepSVM-fold: protein fold recognition by combining support vector machines and pairwise sequence similarity scores generated by deep learning networks.DeepSVM-fold:通过结合支持向量机和深度学习网络生成的成对序列相似性得分来进行蛋白质折叠识别。
Brief Bioinform. 2020 Sep 25;21(5):1733-1741. doi: 10.1093/bib/bbz098.
6
Predicting disease-associated circular RNAs using deep forests combined with positive-unlabeled learning methods.使用深度森林结合正无标记学习方法预测疾病相关的环状 RNA。
Brief Bioinform. 2020 Jul 15;21(4):1425-1436. doi: 10.1093/bib/bbz080.
7
A comprehensive comparison and analysis of computational predictors for RNA N6-methyladenosine sites of Saccharomyces cerevisiae.全面比较和分析酿酒酵母 RNA N6-甲基腺苷位点的计算预测因子。
Brief Funct Genomics. 2019 Nov 19;18(6):367-376. doi: 10.1093/bfgp/elz018.
8
Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism.基于预训练 DNA 向量和注意力机制的神经网络识别增强子-启动子相互作用。
Bioinformatics. 2020 Feb 15;36(4):1037-1043. doi: 10.1093/bioinformatics/btz694.
9
BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.BioSeq-Analysis2.0:一个基于机器学习方法的更新平台,用于在序列水平和残基水平上分析 DNA、RNA 和蛋白质序列。
Nucleic Acids Res. 2019 Nov 18;47(20):e127. doi: 10.1093/nar/gkz740.
10
Identifying Methylation Pattern and Genes Associated with Breast Cancer Subtypes.鉴定与乳腺癌亚型相关的甲基化模式和基因。
Int J Mol Sci. 2019 Aug 31;20(17):4269. doi: 10.3390/ijms20174269.