• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于特征选择的蛋白质功能预测的综合方法。

Integrative approaches to the prediction of protein functions based on the feature selection.

机构信息

Department of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea.

出版信息

BMC Bioinformatics. 2009 Dec 31;10:455. doi: 10.1186/1471-2105-10-455.

DOI:10.1186/1471-2105-10-455
PMID:20043848
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2813249/
Abstract

BACKGROUND

Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue.

RESULTS

We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus, including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO) terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR), and a kernel based L1-norm regularized logistic regression (KL1LR). In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among a group of protein functions that have the same parent in the GO hierarchy.

CONCLUSIONS

In contrast to previous integration methods, our approaches not only increase the prediction quality but also gather information about highly contributing data sources for each protein function. This information can help researchers collect relevant data sources for annotating protein functions.

摘要

背景

蛋白质功能预测一直是功能基因组学中最重要的问题之一。随着各种基因组数据集的当前可用性,许多研究人员试图开发整合模型,将所有可用的基因组数据结合起来进行蛋白质功能预测。这些努力提高了预测质量并扩大了预测范围。然而,也有人观察到,整合更多的数据源并不总是能提高预测质量。因此,选择对蛋白质功能预测有高度贡献的数据源已成为一个重要问题。

结果

我们提出了系统的特征选择方法,评估了基因组数据集对预测蛋白质功能的贡献,然后研究了基因组数据源与蛋白质功能之间的关系。在这项研究中,我们使用了 10 种不同的基因组数据源在 Mus musculus 中,包括:蛋白质结构域、蛋白质-蛋白质相互作用、基因表达、表型本体、系统发育谱和疾病数据源,以预测用基因本体 (GO) 术语标记的蛋白质功能。然后,我们应用两种方法进行特征选择:基于核的逻辑回归(KLR)的穷举搜索特征选择,以及基于核的 L1-范数正则化逻辑回归(KL1LR)。在第一种方法中,我们根据预测质量,穷举测量每个数据集对每个功能的贡献。在第二种方法中,我们使用特征的估计系数作为数据源贡献的度量。我们的结果表明,与整合所有数据源和其他基于过滤器的特征选择方法相比,所提出的方法提高了预测质量。我们还表明,贡献数据源可能因蛋白质功能而异。此外,我们观察到,在 GO 层次结构中具有相同父级的一组蛋白质功能中,高度贡献的数据集可能相似。

结论

与以前的整合方法不同,我们的方法不仅提高了预测质量,而且还收集了有关每个蛋白质功能的高度贡献数据源的信息。这些信息可以帮助研究人员收集注释蛋白质功能的相关数据源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/b64de67c1428/1471-2105-10-455-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/03aa78f2c536/1471-2105-10-455-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/0538e18940db/1471-2105-10-455-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/7a180284d88d/1471-2105-10-455-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/7aa40f61389d/1471-2105-10-455-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/b64de67c1428/1471-2105-10-455-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/03aa78f2c536/1471-2105-10-455-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/0538e18940db/1471-2105-10-455-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/7a180284d88d/1471-2105-10-455-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/7aa40f61389d/1471-2105-10-455-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/34bc/2813249/b64de67c1428/1471-2105-10-455-5.jpg

相似文献

1
Integrative approaches to the prediction of protein functions based on the feature selection.基于特征选择的蛋白质功能预测的综合方法。
BMC Bioinformatics. 2009 Dec 31;10:455. doi: 10.1186/1471-2105-10-455.
2
An integrated approach to the prediction of domain-domain interactions.一种预测结构域-结构域相互作用的综合方法。
BMC Bioinformatics. 2006 May 25;7:269. doi: 10.1186/1471-2105-7-269.
3
Diffusion kernel-based logistic regression models for protein function prediction.基于扩散核的蛋白质功能预测逻辑回归模型
OMICS. 2006 Spring;10(1):40-55. doi: 10.1089/omi.2006.10.40.
4
5
Protein domain annotation with integration of heterogeneous information sources.整合异构信息源的蛋白质结构域注释
Proteins. 2008 Jul;72(1):461-73. doi: 10.1002/prot.21943.
6
Assessing the limits of genomic data integration for predicting protein networks.评估用于预测蛋白质网络的基因组数据整合的局限性。
Genome Res. 2005 Jul;15(7):945-53. doi: 10.1101/gr.3610305.
7
Evaluating the impact of topological protein features on the negative examples selection.评估拓扑蛋白特征对负例选择的影响。
BMC Bioinformatics. 2018 Nov 20;19(Suppl 14):417. doi: 10.1186/s12859-018-2385-x.
8
Predicting gene ontology functions from protein's regional surface structures.从蛋白质的区域表面结构预测基因本体功能。
BMC Bioinformatics. 2007 Dec 11;8:475. doi: 10.1186/1471-2105-8-475.
9
Gene expression trends and protein features effectively complement each other in gene function prediction.基因表达趋势和蛋白质特征在基因功能预测中能有效互补。
Bioinformatics. 2009 Feb 1;25(3):322-30. doi: 10.1093/bioinformatics/btn625. Epub 2008 Dec 2.
10
An integrated probabilistic model for functional prediction of proteins.一种用于蛋白质功能预测的综合概率模型。
J Comput Biol. 2004;11(2-3):463-75. doi: 10.1089/1066527041410346.

引用本文的文献

1
PhyloGene server for identification and visualization of co-evolving proteins using normalized phylogenetic profiles.PhyloGene服务器:利用标准化系统发育谱鉴定和可视化共同进化的蛋白质。
Nucleic Acids Res. 2015 Jul 1;43(W1):W154-9. doi: 10.1093/nar/gkv452. Epub 2015 May 9.
2
Gene function prediction based on the Gene Ontology hierarchical structure.基于基因本体层次结构的基因功能预测
PLoS One. 2014 Sep 5;9(9):e107187. doi: 10.1371/journal.pone.0107187. eCollection 2014.
3
Predicting drug-target interactions using drug-drug interactions.

本文引用的文献

1
A survey of available tools and web servers for analysis of protein-protein interactions and interfaces.用于分析蛋白质-蛋白质相互作用及界面的现有工具和网络服务器的调查。
Brief Bioinform. 2009 May;10(3):217-32. doi: 10.1093/bib/bbp001. Epub 2009 Feb 24.
2
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.基于伪氨基酸组成和混合特征选择的蛋白质-蛋白质相互作用预测
Biochem Biophys Res Commun. 2009 Mar 6;380(2):318-22. doi: 10.1016/j.bbrc.2009.01.077. Epub 2009 Jan 24.
3
InterPro: the integrative protein signature database.
基于药物-药物相互作用预测药物-靶点相互作用。
PLoS One. 2013 Nov 21;8(11):e80129. doi: 10.1371/journal.pone.0080129. eCollection 2013.
4
Human disease locus discovery and mapping to molecular pathways through phylogenetic profiling.通过系统发生谱分析发现人类疾病基因座并映射到分子途径。
Mol Syst Biol. 2013 Oct 1;9:692. doi: 10.1038/msb.2013.50.
5
A Resource of Quantitative Functional Annotation for Homo sapiens Genes.人类基因定量功能注释资源。
G3 (Bethesda). 2012 Feb;2(2):223-33. doi: 10.1534/g3.111.000828. Epub 2012 Feb 1.
InterPro:综合蛋白质特征数据库。
Nucleic Acids Res. 2009 Jan;37(Database issue):D211-5. doi: 10.1093/nar/gkn785. Epub 2008 Oct 21.
4
Accurate sequence-based prediction of catalytic residues.基于序列的催化残基精确预测。
Bioinformatics. 2008 Oct 15;24(20):2329-38. doi: 10.1093/bioinformatics/btn433. Epub 2008 Aug 18.
5
Architectures and functional coverage of protein-protein interfaces.蛋白质-蛋白质相互作用界面的结构与功能覆盖范围。
J Mol Biol. 2008 Sep 5;381(3):785-802. doi: 10.1016/j.jmb.2008.04.071. Epub 2008 May 6.
6
A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.利用整合基因组证据对小家鼠基因功能预测的批判性评估。
Genome Biol. 2008;9 Suppl 1(Suppl 1):S2. doi: 10.1186/gb-2008-9-s1-s2. Epub 2008 Jun 27.
7
Skeletal overexpression of connective tissue growth factor impairs bone formation and causes osteopenia.结缔组织生长因子在骨骼中的过度表达会损害骨形成并导致骨质减少。
Endocrinology. 2008 Sep;149(9):4374-81. doi: 10.1210/en.2008-0254. Epub 2008 Jun 5.
8
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.SCPRED:对与预测序列具有模糊相似性的序列的蛋白质结构类别进行准确预测。
BMC Bioinformatics. 2008 May 1;9:226. doi: 10.1186/1471-2105-9-226.
9
Probabilistic protein function prediction from heterogeneous genome-wide data.从全基因组异质数据中进行概率性蛋白质功能预测。
PLoS One. 2007 Mar 28;2(3):e337. doi: 10.1371/journal.pone.0000337.
10
Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。
Nucleic Acids Res. 2007 Jan;35(Database issue):D5-12. doi: 10.1093/nar/gkl1031. Epub 2006 Dec 14.