利用 PPI 网络自相关性在层次多标签分类树中进行基因功能预测。

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

机构信息

Department of Knowledge Technologies, JoŽef Stefan Institute, Jamova cesta 39, Ljubljana, Slovenia.

出版信息

BMC Bioinformatics. 2013 Sep 26;14:285. doi: 10.1186/1471-2105-14-285.

DOI:10.1186/1471-2105-14-285

PMID:24070402

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3850549/

Abstract

BACKGROUND

Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers.

RESULTS

This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function.

CONCLUSIONS

Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.

摘要

背景

基因功能的本体论和目录，如基因本体论（GO）和 MIPS-FUN，假设功能类别是按层次组织的，即一般功能包含更具体的功能。这最近激发了几种用于基因功能预测的机器学习算法的发展，这些算法利用了这种层次结构，其中实例可能属于多个类别。此外，还可以利用实例之间的关系，因为相关基因往往具有相似的功能注释是合理的。尽管这些关系已经在蛋白质-蛋白质相互作用（PPI）网络领域被识别和广泛研究，但它们在层次和多类基因功能预测中并没有得到太多关注。基因之间的关系在功能注释中引入了自相关性，并违反了实例是独立同分布（i.i.d.）的假设，这是大多数机器学习算法的基础。虽然明确考虑这些关系会给学习过程带来额外的复杂性，但我们预计在学习分类器的预测准确性方面会有实质性的好处。

结果

本文展示了在多类基因功能预测中考虑自相关性的好处（以预测准确性为衡量标准）。我们开发了一种基于树的算法，用于在层次多标签分类（HMC）中考虑网络自相关性。我们使用每个 MIPS-FUN 和 GO 注释方案以及利用 2 个不同的 PPI 网络，在 12 个酵母数据集上对所提出的算法（称为 NHMC，即网络层次多标签分类）进行了实证评估。结果清楚地表明，考虑自相关性可以提高学习模型预测基因功能的预测性能。

结论

我们新开发的 HMC 方法在学习阶段考虑了网络信息：当在 PPI 网络的背景下用于基因功能预测时，明确考虑网络自相关性会提高学习模型的预测性能。总的来说，我们发现这适用于不同的基因特征/描述、功能注释方案和 PPI 网络：当 PPI 网络密集且包含大量与功能相关的相互作用时，效果最佳。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d7e/3850549/bc33255b4735/1471-2105-14-285-1.jpg

相似文献

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.利用 PPI 网络自相关性在层次多标签分类树中进行基因功能预测。

BMC Bioinformatics. 2013 Sep 26;14:285. doi: 10.1186/1471-2105-14-285.

A Deep Learning Framework for Gene Ontology Annotations With Sequence- and Network-Based Information.基于序列和网络信息的基因本体论注释深度学习框架。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2208-2217. doi: 10.1109/TCBB.2020.2968882. Epub 2021 Dec 8.

Machine learning for discovering missing or wrong protein function annotations : A comparison using updated benchmark datasets.基于更新的基准数据集的比较：用于发现缺失或错误蛋白质功能注释的机器学习方法。

BMC Bioinformatics. 2019 Sep 23;20(1):485. doi: 10.1186/s12859-019-3060-6.

A deep neural network based hierarchical multi-label classification method.一种基于深度神经网络的层次多标签分类方法。

Rev Sci Instrum. 2020 Feb 1;91(2):024103. doi: 10.1063/1.5141161.

Exploiting ontology graph for predicting sparsely annotated gene function.利用本体图预测注释稀疏的基因功能。

Bioinformatics. 2015 Jun 15;31(12):i357-64. doi: 10.1093/bioinformatics/btv260.

NewGOA: Predicting New GO Annotations of Proteins by Bi-Random Walks on a Hybrid Graph.NewGOA：基于混合图双随机游走的蛋白质新 GO 注释预测。

IEEE/ACM Trans Comput Biol Bioinform. 2018 Jul-Aug;15(4):1390-1402. doi: 10.1109/TCBB.2017.2715842. Epub 2017 Jun 15.

Modular biological function is most effectively captured by combining molecular interaction data types.模块化的生物功能是通过结合分子相互作用数据类型来最有效地捕捉到的。

PLoS One. 2013 May 3;8(5):e62670. doi: 10.1371/journal.pone.0062670. Print 2013.

Functional annotation of hierarchical modularity.层次模块化的功能注释。

PLoS One. 2012;7(4):e33744. doi: 10.1371/journal.pone.0033744. Epub 2012 Apr 4.

Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations.新型指标提高预测基因本体论注释的优先级。

IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):954-965. doi: 10.1109/TCBB.2017.2695459. Epub 2017 Apr 18.

Exploiting MEDLINE for gene molecular function prediction via NMF based multi-label classification.利用基于 NMF 的多标签分类挖掘 MEDLINE 进行基因分子功能预测。

J Biomed Inform. 2018 Oct;86:160-166. doi: 10.1016/j.jbi.2018.08.009. Epub 2018 Aug 18.

引用本文的文献

Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum.鉴定Cas9和Cas12蛋白中家族特异性特征：一种使用完整蛋白质特征谱的机器学习方法。

bioRxiv. 2024 Jan 23:2024.01.22.576286. doi: 10.1101/2024.01.22.576286.

PCfun: a hybrid computational framework for systematic characterization of protein complex function.PCfun：一种用于系统表征蛋白质复合物功能的混合计算框架。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac239.

Gene function finding through cross-organism ensemble learning.通过跨物种集成学习进行基因功能发现。

BioData Min. 2021 Feb 12;14(1):14. doi: 10.1186/s13040-021-00239-w.

PSIONplus Server for Accurate Multi-Label Prediction of Ion Channels and Their Types.PSIONplus 服务器，用于准确预测离子通道及其类型的多标签。

Biomolecules. 2020 Jun 7;10(6):876. doi: 10.3390/biom10060876.

BMC Bioinformatics. 2019 Sep 23;20(1):485. doi: 10.1186/s12859-019-3060-6.

Identification of key transcription factors - gene regulatory network related with osteogenic differentiation of human mesenchymal stem cells based on transcription factor prognosis system.基于转录因子预后系统的人骨髓间充质干细胞成骨分化相关关键转录因子-基因调控网络的鉴定

Exp Ther Med. 2019 Mar;17(3):2113-2122. doi: 10.3892/etm.2019.7170. Epub 2019 Jan 14.

Predicting multicellular function through multi-layer tissue networks.通过多层组织网络预测多细胞功能。

Bioinformatics. 2017 Jul 15;33(14):i190-i198. doi: 10.1093/bioinformatics/btx252.

DMDtoolkit: a tool for visualizing the mutated dystrophin protein and predicting the clinical severity in DMD.DMDtoolkit：一种用于可视化突变的抗肌萎缩蛋白并预测杜氏肌营养不良症临床严重程度的工具。

BMC Bioinformatics. 2017 Feb 2;18(1):87. doi: 10.1186/s12859-017-1504-4.

Reduction strategies for hierarchical multi-label classification in protein function prediction.蛋白质功能预测中分层多标签分类的归约策略

BMC Bioinformatics. 2016 Sep 15;17(1):373. doi: 10.1186/s12859-016-1232-1.

ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks.ComiRNet：一个基于网络的用于分析微小RNA-基因调控网络的系统。

BMC Bioinformatics. 2015;16 Suppl 9(Suppl 9):S7. doi: 10.1186/1471-2105-16-S9-S7. Epub 2015 Jun 1.

本文引用的文献

A large-scale evaluation of computational protein function prediction.大规模计算蛋白质功能预测评估。

Nat Methods. 2013 Mar;10(3):221-7. doi: 10.1038/nmeth.2340. Epub 2013 Jan 27.

"Guilt by association" is the exception rather than the rule in gene networks.“关联定罪”在基因网络中是例外而非常规。

PLoS Comput Biol. 2012;8(3):e1002444. doi: 10.1371/journal.pcbi.1002444. Epub 2012 Mar 29.

Protein complex detection with semi-supervised learning in protein interaction networks.利用蛋白质相互作用网络中的半监督学习检测蛋白质复合物。

Proteome Sci. 2011 Oct 14;9 Suppl 1(Suppl 1):S5. doi: 10.1186/1477-5956-9-S1-S5.

It's the machine that matters: Predicting gene function and phenotype from protein networks.关键在于机器：从蛋白质网络预测基因功能和表型。

J Proteomics. 2010 Oct 10;73(11):2277-89. doi: 10.1016/j.jprot.2010.07.005. Epub 2010 Jul 15.

True path rule hierarchical ensembles for genome-wide gene function prediction.基于真路径规则的层次集成算法进行全基因组基因功能预测。

IEEE/ACM Trans Comput Biol Bioinform. 2011 May-Jun;8(3):832-47. doi: 10.1109/TCBB.2010.38.

Hierarchical classification of gene ontology terms using the GOstruct method.使用GOstruct方法对基因本体术语进行层次分类。

J Bioinform Comput Biol. 2010 Apr;8(2):357-76. doi: 10.1142/s0219720010004744.

Predicting gene function using hierarchical multi-label decision tree ensembles.基于层次多标签决策树集成模型预测基因功能。

BMC Bioinformatics. 2010 Jan 2;11:2. doi: 10.1186/1471-2105-11-2.

Incorporating functional inter-relationships into protein function prediction algorithms.将功能相互关系纳入蛋白质功能预测算法。

BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.

Uncovering biological network function via graphlet degree signatures.通过图let度特征揭示生物网络功能

Cancer Inform. 2008;6:257-73. Epub 2008 Apr 14.

Integration of relational and hierarchical network information for protein function prediction.整合关系型和层次型网络信息用于蛋白质功能预测。

BMC Bioinformatics. 2008 Aug 22;9:350. doi: 10.1186/1471-2105-9-350.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用 PPI 网络自相关性在层次多标签分类树中进行基因功能预测。

Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献