• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

C类G蛋白偶联受体亚型识别中的标签噪声:一种分析分类错误的系统方法。

Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors.

作者信息

König Caroline, Cárdenas Martha I, Giraldo Jesús, Alquézar René, Vellido Alfredo

机构信息

Dept. of Computer Science, Univ. Politècnica de Catalunya, C. Jordi Girona, 1-3, Barcelona, 08034, Spain.

Institut de Neurociències, Unitat de Bioestadística, Univ. Autònoma de Barcelona, Cerdanyola del Vallès, Barcelona, 08193, Spain.

出版信息

BMC Bioinformatics. 2015 Sep 29;16:314. doi: 10.1186/s12859-015-0731-9.

DOI:10.1186/s12859-015-0731-9
PMID:26415951
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4587730/
Abstract

BACKGROUND

The characterization of proteins in families and subfamilies, at different levels, entails the definition and use of class labels. When the adscription of a protein to a family is uncertain, or even wrong, this becomes an instance of what has come to be known as a label noise problem. Label noise has a potentially negative effect on any quantitative analysis of proteins that depends on label information. This study investigates class C of G protein-coupled receptors, which are cell membrane proteins of relevance both to biology in general and pharmacology in particular. Their supervised classification into different known subtypes, based on primary sequence data, is hampered by label noise. The latter may stem from a combination of expert knowledge limitations and the lack of a clear correspondence between labels that mostly reflect GPCR functionality and the different representations of the protein primary sequences.

RESULTS

In this study, we describe a systematic approach, using Support Vector Machine classifiers, to the analysis of G protein-coupled receptor misclassifications. As a proof of concept, this approach is used to assist the discovery of labeling quality problems in a curated, publicly accessible database of this type of proteins. We also investigate the extent to which physico-chemical transformations of the protein sequences reflect G protein-coupled receptor subtype labeling. The candidate mislabeled cases detected with this approach are externally validated with phylogenetic trees and against further trusted sources such as the National Center for Biotechnology Information, Universal Protein Resource, European Bioinformatics Institute and Ensembl Genome Browser information repositories.

CONCLUSIONS

In quantitative classification problems, class labels are often by default assumed to be correct. Label noise, though, is bound to be a pervasive problem in bioinformatics, where labels may be obtained indirectly through complex, many-step similarity modelling processes. In the case of G protein-coupled receptors, methods capable of singling out and characterizing those sequences with consistent misclassification behaviour are required to minimize this problem. A systematic, Support Vector Machine-based method has been proposed in this study for such purpose. The proposed method enables a filtering approach to the label noise problem and might become a support tool for database curators in proteomics.

摘要

背景

在不同层面上对蛋白质家族和亚家族中的蛋白质进行特征描述,需要定义和使用类别标签。当蛋白质归属于某个家族存在不确定性甚至错误时,这就成为了所谓的标签噪声问题的一个实例。标签噪声对任何依赖标签信息的蛋白质定量分析都可能产生负面影响。本研究调查了G蛋白偶联受体的C类,这类受体是对一般生物学尤其是药理学都具有重要意义的细胞膜蛋白。基于一级序列数据将它们监督分类为不同的已知亚型时,会受到标签噪声的阻碍。标签噪声可能源于专家知识的局限性,以及主要反映GPCR功能的标签与蛋白质一级序列的不同表示形式之间缺乏明确对应关系。

结果

在本研究中,我们描述了一种使用支持向量机分类器来分析G蛋白偶联受体错误分类的系统方法。作为概念验证,该方法用于协助在一个经过整理的、可公开访问的此类蛋白质数据库中发现标签质量问题。我们还研究了蛋白质序列的物理化学转换在多大程度上反映G蛋白偶联受体亚型标签。用这种方法检测到的候选错误标记案例通过系统发育树并对照诸如美国国立生物技术信息中心、通用蛋白质资源库、欧洲生物信息学研究所和Ensembl基因组浏览器信息库等更多可靠来源进行外部验证。

结论

在定量分类问题中,类别标签通常默认被认为是正确的。然而,标签噪声在生物信息学中必然是一个普遍存在的问题,在生物信息学中,标签可能通过复杂的多步相似性建模过程间接获得。就G蛋白偶联受体而言,需要能够挑选出并表征那些具有一致错误分类行为的序列的方法,以尽量减少这个问题。本研究为此目的提出了一种基于支持向量机的系统方法。所提出的方法能够对标签噪声问题采用过滤方法,并且可能成为蛋白质组学中数据库管理员的一个支持工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/a8c42b825501/12859_2015_731_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/dc3571373e1f/12859_2015_731_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/0e6aa1f76bd3/12859_2015_731_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/3fd9de62e0e2/12859_2015_731_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/94675e9a2808/12859_2015_731_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/0e82de8354e7/12859_2015_731_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/e1ca02d32418/12859_2015_731_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/a8c42b825501/12859_2015_731_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/dc3571373e1f/12859_2015_731_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/0e6aa1f76bd3/12859_2015_731_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/3fd9de62e0e2/12859_2015_731_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/94675e9a2808/12859_2015_731_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/0e82de8354e7/12859_2015_731_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/e1ca02d32418/12859_2015_731_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b2e6/4587730/a8c42b825501/12859_2015_731_Fig7_HTML.jpg

相似文献

1
Label noise in subtype discrimination of class C G protein-coupled receptors: A systematic approach to the analysis of classification errors.C类G蛋白偶联受体亚型识别中的标签噪声:一种分析分类错误的系统方法。
BMC Bioinformatics. 2015 Sep 29;16:314. doi: 10.1186/s12859-015-0731-9.
2
Using random forests for assistance in the curation of G-protein coupled receptor databases.利用随机森林辅助策划G蛋白偶联受体数据库。
Biomed Eng Online. 2017 Aug 18;16(Suppl 1):75. doi: 10.1186/s12938-017-0357-4.
3
Systematic Analysis of Primary Sequence Domain Segments for the Discrimination Between Class C GPCR Subtypes.系统分析 C 类 GPCR 亚型区分的主要序列结构域片段。
Interdiscip Sci. 2018 Mar;10(1):43-52. doi: 10.1007/s12539-018-0286-3. Epub 2018 Feb 19.
4
The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs.无比对序列表示对C类G蛋白偶联受体半监督分类的影响:C类G蛋白偶联受体的半监督分类
Med Biol Eng Comput. 2015 Feb;53(2):137-49. doi: 10.1007/s11517-014-1218-y. Epub 2014 Nov 4.
5
Reducing the n-gram feature space of class C GPCRs to subtype-discriminating patterns.将C类G蛋白偶联受体的n元语法特征空间缩减为亚型区分模式。
J Integr Bioinform. 2014 Oct 23;11(3):254. doi: 10.2390/biecoll-jib-2014-254.
6
Classification of GPCRs using family specific motifs.基于家族特异性基序对 G 蛋白偶联受体进行分类。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Nov-Dec;8(6):1495-508. doi: 10.1109/TCBB.2010.101.
7
Using machine learning tools for protein database biocuration assistance.利用机器学习工具辅助蛋白质数据库生物注释。
Sci Rep. 2018 Jul 5;8(1):10148. doi: 10.1038/s41598-018-28330-z.
8
Reduced alphabet motif methodology for GPCR annotation.用于G蛋白偶联受体注释的简化字母基序方法。
J Biomol Struct Dyn. 2007 Dec;25(3):299-310. doi: 10.1080/07391102.2007.10507178.
9
GPCR-MPredictor: multi-level prediction of G protein-coupled receptors using genetic ensemble.GPCR-MPredictor:基于遗传集成的 G 蛋白偶联受体多层次预测
Amino Acids. 2012 May;42(5):1809-23. doi: 10.1007/s00726-011-0902-6. Epub 2011 Apr 20.
10
On the hierarchical classification of G protein-coupled receptors.关于G蛋白偶联受体的层次分类
Bioinformatics. 2007 Dec 1;23(23):3113-8. doi: 10.1093/bioinformatics/btm506. Epub 2007 Oct 22.

引用本文的文献

1
Unraveling response to temozolomide in preclinical GL261 glioblastoma with MRI/MRSI using radiomics and signal source extraction.使用放射组学和信号源提取技术,通过 MRI/MRSI 对 GL261 神经胶质瘤的替莫唑胺反应进行剖析。
Sci Rep. 2020 Nov 12;10(1):19699. doi: 10.1038/s41598-020-76686-y.
2
Using machine learning tools for protein database biocuration assistance.利用机器学习工具辅助蛋白质数据库生物注释。
Sci Rep. 2018 Jul 5;8(1):10148. doi: 10.1038/s41598-018-28330-z.
3
Using random forests for assistance in the curation of G-protein coupled receptor databases.

本文引用的文献

1
Metabotropic glutamate receptors as drug targets: what's new?作为药物靶点的代谢型谷氨酸受体:有哪些新进展?
Curr Opin Pharmacol. 2015 Feb;20:89-94. doi: 10.1016/j.coph.2014.12.002. Epub 2014 Dec 12.
2
The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs.无比对序列表示对C类G蛋白偶联受体半监督分类的影响:C类G蛋白偶联受体的半监督分类
Med Biol Eng Comput. 2015 Feb;53(2):137-49. doi: 10.1007/s11517-014-1218-y. Epub 2014 Nov 4.
3
Opportunities and challenges in the discovery of allosteric modulators of GPCRs for treating CNS disorders.
利用随机森林辅助策划G蛋白偶联受体数据库。
Biomed Eng Online. 2017 Aug 18;16(Suppl 1):75. doi: 10.1186/s12938-017-0357-4.
4
The parameter sensitivity of random forests.随机森林的参数敏感性。
BMC Bioinformatics. 2016 Sep 1;17(1):331. doi: 10.1186/s12859-016-1228-x.
发现用于治疗中枢神经系统疾病的GPCR变构调节剂的机遇与挑战。
Nat Rev Drug Discov. 2014 Sep;13(9):692-708. doi: 10.1038/nrd4308.
4
Structure of class C GPCR metabotropic glutamate receptor 5 transmembrane domain.C 类 G 蛋白偶联受体代谢型谷氨酸受体 5 的跨膜结构域。
Nature. 2014 Jul 31;511(7511):557-62. doi: 10.1038/nature13396. Epub 2014 Jul 6.
5
Determination of prognosis in metastatic melanoma through integration of clinico-pathologic, mutation, mRNA, microRNA, and protein information.通过整合临床病理、突变、mRNA、microRNA 和蛋白质信息来确定转移性黑色素瘤的预后。
Int J Cancer. 2015 Feb 15;136(4):863-74. doi: 10.1002/ijc.29047. Epub 2014 Jul 24.
6
Classification in the presence of label noise: a survey.带标签噪声的分类:综述。
IEEE Trans Neural Netw Learn Syst. 2014 May;25(5):845-69. doi: 10.1109/TNNLS.2013.2292894.
7
Protein sequence classification with improved extreme learning machine algorithms.基于改进的极限学习机算法的蛋白质序列分类
Biomed Res Int. 2014;2014:103054. doi: 10.1155/2014/103054. Epub 2014 Mar 30.
8
Structure of a class C GPCR metabotropic glutamate receptor 1 bound to an allosteric modulator.结构的一类 C G 蛋白偶联受体代谢型谷氨酸受体 1 结合到一个变构调节剂。
Science. 2014 Apr 4;344(6179):58-64. doi: 10.1126/science.1249489. Epub 2014 Mar 6.
9
GPCRDB: an information system for G protein-coupled receptors.GPCRDB:一个 G 蛋白偶联受体的信息系统。
Nucleic Acids Res. 2014 Jan;42(Database issue):D422-5. doi: 10.1093/nar/gkt1255. Epub 2013 Dec 3.
10
An overview of the diverse roles of G-protein coupled receptors (GPCRs) in the pathophysiology of various human diseases.G 蛋白偶联受体(GPCRs)在多种人类疾病病理生理学中的多种作用概述。
Biotechnol Adv. 2013 Dec;31(8):1676-94. doi: 10.1016/j.biotechadv.2013.08.017. Epub 2013 Aug 30.