• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于全基因组序列的新型半监督学习技术预测外周蛋白。

Genome-wide sequence-based prediction of peripheral proteins using a novel semi-supervised learning technique.

机构信息

Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA.

出版信息

BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-11-S1-S6.

DOI:10.1186/1471-2105-11-S1-S6
PMID:20122235
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3009533/
Abstract

BACKGROUND

In supervised learning, traditional approaches to building a classifier use two sets of examples with pre-defined classes along with a learning algorithm. The main limitation of this approach is that examples from both classes are required which might be infeasible in certain cases, especially those dealing with biological data. Such is the case for membrane-binding peripheral domains that play important roles in many biological processes, including cell signaling and membrane trafficking by reversibly binding to membranes. For these domains, a well-defined positive set is available with domains known to bind membrane along with a large unlabeled set of domains whose membrane binding affinities have not been measured. The aforementioned limitation can be addressed by a special class of semi-supervised machine learning called positive-unlabeled (PU) learning that uses a positive set with a large unlabeled set. METHODS In this study, we implement the first application of PU-learning to a protein function prediction problem: identification of peripheral domains. PU-learning starts by identifying reliable negative (RN) examples iteratively from the unlabeled set until convergence and builds a classifier using the positive and the final RN set. A data set of 232 positive cases and ~3750 unlabeled ones were used to construct and validate the protocol.

RESULTS

Holdout evaluation of the protocol on a left-out positive set showed that the accuracy of prediction reached up to 95% during two independent implementations.

CONCLUSION

These results suggest that our protocol can be used for predicting membrane-binding properties of a wide variety of modular domains. Protocols like the one presented here become particularly useful in the case of availability of information from one class only.

摘要

背景

在监督学习中,传统的分类器构建方法使用两组具有预定义类别的示例以及学习算法。这种方法的主要限制是需要来自两个类别的示例,这在某些情况下可能是不可行的,尤其是那些涉及生物数据的情况。这种情况适用于膜结合的外围结构域,这些结构域在外周结构域在许多生物过程中发挥着重要作用,包括通过可逆地与膜结合来进行细胞信号转导和膜运输。对于这些结构域,有一个定义明确的阳性集,其中包含已知与膜结合的结构域,以及一个包含大量未标记的结构域的集合,这些结构域的膜结合亲和力尚未测量。上述限制可以通过一种称为正-未标记(PU)学习的特殊半监督机器学习方法来解决,该方法使用带有大量未标记集的正集。

方法

在这项研究中,我们首次将 PU 学习应用于蛋白质功能预测问题:识别外围结构域。PU 学习从未标记的集合中迭代地识别可靠的负例(RN),直到收敛,并使用正例和最终的 RN 集合构建分类器。使用 232 个阳性案例和大约 3750 个未标记案例的数据集来构建和验证该方案。

结果

在两个独立的实现中,通过对一个保留的阳性集进行留一法评估,该方案的预测准确率高达 95%。

结论

这些结果表明,我们的方案可用于预测各种模块化结构域的膜结合性质。在只有一类信息可用的情况下,像本文提出的方案变得特别有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/748889c00dce/1471-2105-11-S1-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/23cba02508c9/1471-2105-11-S1-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/11c5bec7c1d1/1471-2105-11-S1-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/2ca6fa66b6db/1471-2105-11-S1-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/ea5e3b57b18f/1471-2105-11-S1-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/9dd2ca09a2c0/1471-2105-11-S1-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/748889c00dce/1471-2105-11-S1-S6-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/23cba02508c9/1471-2105-11-S1-S6-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/11c5bec7c1d1/1471-2105-11-S1-S6-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/2ca6fa66b6db/1471-2105-11-S1-S6-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/ea5e3b57b18f/1471-2105-11-S1-S6-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/9dd2ca09a2c0/1471-2105-11-S1-S6-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b725/3009533/748889c00dce/1471-2105-11-S1-S6-6.jpg

相似文献

1
Genome-wide sequence-based prediction of peripheral proteins using a novel semi-supervised learning technique.基于全基因组序列的新型半监督学习技术预测外周蛋白。
BMC Bioinformatics. 2010 Jan 18;11 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-11-S1-S6.
2
Structural bioinformatics prediction of membrane-binding proteins.膜结合蛋白的结构生物信息学预测
J Mol Biol. 2006 Jun 2;359(2):486-95. doi: 10.1016/j.jmb.2006.03.039. Epub 2006 Mar 30.
3
Positive-unlabeled learning for disease gene identification.基于正例无标记学习的疾病基因识别。
Bioinformatics. 2012 Oct 15;28(20):2640-7. doi: 10.1093/bioinformatics/bts504. Epub 2012 Aug 24.
4
Accurate in silico identification of protein succinylation sites using an iterative semi-supervised learning technique.使用迭代半监督学习技术在计算机上准确识别蛋白质琥珀酰化位点
J Theor Biol. 2015 Jun 7;374:60-5. doi: 10.1016/j.jtbi.2015.03.029. Epub 2015 Apr 2.
5
Genome-wide pre-miRNA discovery from few labeled examples.从少量标记的样本中进行全基因组预 miRNA 发现。
Bioinformatics. 2018 Feb 15;34(4):541-549. doi: 10.1093/bioinformatics/btx612.
6
Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples.通过从正例和无标签样例中学习来有效识别化合物-蛋白质相互作用。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1832-1843. doi: 10.1109/TCBB.2016.2570211. Epub 2016 May 18.
7
Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.利用排列检验评估正无标签学习在高维生物学数据集上的置信度。
BMC Bioinformatics. 2024 Jun 19;25(1):218. doi: 10.1186/s12859-024-05834-2.
8
SemiBoost: boosting for semi-supervised learning.半增强算法:用于半监督学习的增强算法
IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2000-14. doi: 10.1109/TPAMI.2008.235.
9
Semi-supervised protein subcellular localization.半监督蛋白质亚细胞定位
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S47. doi: 10.1186/1471-2105-10-S1-S47.
10
Semi-supervised prediction of protein subcellular localization using abstraction augmented Markov models.基于抽象增强马尔可夫模型的蛋白质亚细胞定位半监督预测。
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S6. doi: 10.1186/1471-2105-11-S8-S6.

引用本文的文献

1
Machine learning-based prediction reveals kinase MAP4K4 regulates neutrophil differentiation through phosphorylating apoptosis-related proteins.基于机器学习的预测表明,激酶MAP4K4通过磷酸化凋亡相关蛋白来调节中性粒细胞分化。
PLoS Comput Biol. 2025 Mar 17;21(3):e1012877. doi: 10.1371/journal.pcbi.1012877. eCollection 2025 Mar.
2
Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets.利用排列检验评估正无标签学习在高维生物学数据集上的置信度。
BMC Bioinformatics. 2024 Jun 19;25(1):218. doi: 10.1186/s12859-024-05834-2.
3
Learning peptide properties with positive examples only.

本文引用的文献

1
Learning to translate sequence and structure to function: identifying DNA binding and membrane binding proteins.学习将序列和结构转化为功能:识别DNA结合蛋白和膜结合蛋白。
Ann Biomed Eng. 2007 Jun;35(6):1043-52. doi: 10.1007/s10439-007-9312-z. Epub 2007 Apr 13.
2
Structural basis for targeting HIV-1 Gag proteins to the plasma membrane for virus assembly.将HIV-1 Gag蛋白靶向质膜进行病毒组装的结构基础。
Proc Natl Acad Sci U S A. 2006 Jul 25;103(30):11364-9. doi: 10.1073/pnas.0602818103. Epub 2006 Jul 13.
3
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.
仅通过正例学习肽的特性。
Digit Discov. 2024 Apr 19;3(5):977-986. doi: 10.1039/d3dd00218g. eCollection 2024 May 15.
4
Functional Site Discovery From Incomplete Training Data: A Case Study With Nucleic Acid-Binding Proteins.从不完整训练数据中发现功能位点:以核酸结合蛋白为例的研究
Front Genet. 2019 Aug 30;10:729. doi: 10.3389/fgene.2019.00729. eCollection 2019.
5
Heterodimer Binding Scaffolds Recognition via the Analysis of Kinetically Hot Residues.通过动力学热点残基分析实现异二聚体结合支架识别
Pharmaceuticals (Basel). 2018 Mar 16;11(1):29. doi: 10.3390/ph11010029.
6
Positive-unlabeled learning for the prediction of conformational B-cell epitopes.用于预测构象性B细胞表位的正无标记学习
BMC Bioinformatics. 2015;16 Suppl 18(Suppl 18):S12. doi: 10.1186/1471-2105-16-S18-S12. Epub 2015 Dec 9.
7
Negative example selection for protein function prediction: the NoGO database.用于蛋白质功能预测的负例选择:NoGO数据库。
PLoS Comput Biol. 2014 Jun 12;10(6):e1003644. doi: 10.1371/journal.pcbi.1003644. eCollection 2014 Jun.
8
Genome-wide structural analysis reveals novel membrane binding properties of AP180 N-terminal homology (ANTH) domains.全基因组结构分析揭示了 AP180 N 端同源(ANTH)结构域的新型膜结合特性。
J Biol Chem. 2011 Sep 30;286(39):34155-63. doi: 10.1074/jbc.M111.265611. Epub 2011 Aug 2.
Cd-hit:一个用于对大量蛋白质或核苷酸序列进行聚类和比较的快速程序。
Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26.
4
Structural bioinformatics prediction of membrane-binding proteins.膜结合蛋白的结构生物信息学预测
J Mol Biol. 2006 Jun 2;359(2):486-95. doi: 10.1016/j.jmb.2006.03.039. Epub 2006 Mar 30.
5
Membrane binding domains.膜结合结构域
Biochim Biophys Acta. 2006 Aug;1761(8):805-11. doi: 10.1016/j.bbalip.2006.02.020. Epub 2006 Mar 24.
6
Membrane-protein interactions in cell signaling and membrane trafficking.细胞信号传导与膜运输中的膜蛋白相互作用。
Annu Rev Biophys Biomol Struct. 2005;34:119-51. doi: 10.1146/annurev.biophys.33.110502.133337.
7
A designed probe for acidic phospholipids reveals the unique enriched anionic character of the cytosolic face of the mammalian plasma membrane.一种针对酸性磷脂设计的探针揭示了哺乳动物质膜胞质面独特的富集阴离子特性。
J Biol Chem. 2004 May 21;279(21):21833-40. doi: 10.1074/jbc.M313469200. Epub 2004 Mar 8.
8
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.2003年的SWISS-PROT蛋白质知识库及其补充TrEMBL。
Nucleic Acids Res. 2003 Jan 1;31(1):365-70. doi: 10.1093/nar/gkg095.
9
The phosphatidylinositol 3-Kinase AKT pathway in human cancer.人类癌症中的磷脂酰肌醇3-激酶AKT信号通路
Nat Rev Cancer. 2002 Jul;2(7):489-501. doi: 10.1038/nrc839.
10
Negative regulation of PKB/Akt-dependent cell survival by the tumor suppressor PTEN.肿瘤抑制因子PTEN对依赖蛋白激酶B/蛋白激酶B(PKB/Akt)的细胞存活的负调控。
Cell. 1998 Oct 2;95(1):29-39. doi: 10.1016/s0092-8674(00)81780-8.