确定膜转运蛋白的最佳底物类别。

Identifying optimal substrate classes of membrane transporters.

作者信息

Denger Andreas, Helms Volkhard

机构信息

Center for Bioinformatics, Saarland University, Saarbrücken, Germany.

出版信息

PLoS One. 2024 Dec 19;19(12):e0315330. doi: 10.1371/journal.pone.0315330. eCollection 2024.

DOI:10.1371/journal.pone.0315330

PMID:39700222

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11658592/

Abstract

Membrane transporters are responsible for moving a wide variety of molecules across biological membranes, making them integral to key biological pathways in all organisms. Identifying all membrane transporters within a (meta-)proteome, along with their specific substrates, provides important information for various research fields, including biotechnology, pharmacology, and metabolomics. Protein datasets are frequently annotated with thousands of molecular functions that form complex networks, often with partial or full redundancy and hierarchical relationships. This complexity, along with the low sample count for more specific functions, makes them unsuitable as classes for supervised learning methods, meaning that the creation of an optimal subset of annotations is required. However, selection of this subset requires extensive manual effort, along with knowledge about the biology behind the respective functions. Here, we present an automated pipeline to address this problem. Unlike previous approaches for reducing redundancy in GO datasets, we employ machine learning to identify a subset of functional annotations in a training dataset. Classes in the resulting predictive model meet four essential criteria: sufficient sample size for training predictive models, minimal redundancy, strong class separability, and relevance to substrate transport. Furthermore, we implemented a pipeline for creating training datasets of transmembrane transporters that cover a wide range of organisms, including plants, bacteria, mammals, and single-cell eukaryotes. For a dataset containing 98.1% of transporters from S. cerevisiae, the pipeline automatically reduced the number of functional annotations from 287 to 11 GO terms that could be classified with a median pairwise F1 score of 0.87±0.16. For a meta-organism dataset containing 96% of all transport proteins from S. cerevisiae, A. thaliana, E. coli and human, the number of classes was reduced from 695 to 49, with a median F1 score of 0.92±0.10 between pairs of GO terms. When lowering the percentage of covered proteins down to 67%, the pipeline found a subset of 30 GO terms with a median F1 score of 0.95±0.06.

摘要

膜转运蛋白负责将各种各样的分子转运穿过生物膜，使其成为所有生物体关键生物途径中不可或缺的一部分。识别（宏）蛋白质组中的所有膜转运蛋白及其特定底物，为包括生物技术、药理学和代谢组学在内的各个研究领域提供了重要信息。蛋白质数据集经常被注释有成千上万种分子功能，这些功能形成复杂的网络，通常具有部分或完全冗余以及层次关系。这种复杂性，再加上更特定功能的样本数量较少，使得它们不适合作为监督学习方法的类别，这意味着需要创建一个最佳注释子集。然而，选择这个子集需要大量的人工努力，以及对各个功能背后生物学知识的了解。在此，我们提出了一个自动化流程来解决这个问题。与之前减少基因本体（GO）数据集中冗余的方法不同，我们采用机器学习来识别训练数据集中的功能注释子集。所得预测模型中的类别满足四个基本标准：有足够的样本量用于训练预测模型、冗余最小、类别可分离性强以及与底物转运相关。此外，我们实现了一个用于创建跨膜转运蛋白训练数据集的流程，该数据集涵盖了广泛的生物体，包括植物、细菌、哺乳动物和单细胞真核生物。对于一个包含来自酿酒酵母98.1%转运蛋白的数据集，该流程自动将功能注释的数量从287个减少到11个GO术语，这些术语分类的中位数成对F1分数为0.87±0.16。对于一个包含来自酿酒酵母、拟南芥、大肠杆菌和人类所有转运蛋白96%的宏生物体数据集，类别数量从695个减少到49个，GO术语对之间的中位数F1分数为0.92±0.10。当将覆盖蛋白质的百分比降低到67%时，该流程找到了一个包含30个GO术语的子集，中位数F1分数为0.95±

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/db10/11658592/9d30bba7c8f3/pone.0315330.g001.jpg

相似文献

Identifying optimal substrate classes of membrane transporters.确定膜转运蛋白的最佳底物类别。

PLoS One. 2024 Dec 19;19(12):e0315330. doi: 10.1371/journal.pone.0315330. eCollection 2024.

Optimized Data Set and Feature Construction for Substrate Prediction of Membrane Transporters.优化数据集和特征构建用于预测膜转运蛋白的底物。

J Chem Inf Model. 2022 Dec 12;62(23):6242-6257. doi: 10.1021/acs.jcim.2c00850. Epub 2022 Dec 1.

Prediction of membrane transport proteins and their substrate specificities using primary sequence information.利用一级序列信息预测膜转运蛋白及其底物特异性。

PLoS One. 2014 Jun 26;9(6):e100278. doi: 10.1371/journal.pone.0100278. eCollection 2014.

CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.CvManGO，一种利用计算预测来改进基于文献的基因本体论注释的方法。

Database (Oxford). 2012 Mar 20;2012:bas001. doi: 10.1093/database/bas001. Print 2012.

Consistent prediction of GO protein localization.GO 蛋白定位的一致性预测。

Sci Rep. 2018 May 17;8(1):7757. doi: 10.1038/s41598-018-26041-z.

Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets.结核分枝杆菌 H37Rv 蛋白-蛋白相互作用数据集的比较分析与评估。

BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S20. doi: 10.1186/1471-2164-12-S3-S20.

Exploiting MEDLINE for gene molecular function prediction via NMF based multi-label classification.利用基于 NMF 的多标签分类挖掘 MEDLINE 进行基因分子功能预测。

J Biomed Inform. 2018 Oct;86:160-166. doi: 10.1016/j.jbi.2018.08.009. Epub 2018 Aug 18.

Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance.超越基准测试，迈向针对特定数据集的单细胞 RNA-seq 管道性能的预测模型。

Genome Biol. 2024 Jun 17;25(1):159. doi: 10.1186/s13059-024-03304-9.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.信息论应用于稀疏基因本体注释网络以预测新的基因功能。

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

AVID: an integrative framework for discovering functional relationships among proteins.AVID：一个用于发现蛋白质间功能关系的综合框架。

BMC Bioinformatics. 2005 Jun 1;6:136. doi: 10.1186/1471-2105-6-136.

引用本文的文献

Application of Protein Structure Encodings and Sequence Embeddings for Transporter Substrate Prediction.蛋白质结构编码和序列嵌入在转运蛋白底物预测中的应用。

Molecules. 2025 Aug 1;30(15):3226. doi: 10.3390/molecules30153226.

本文引用的文献

Employing active learning in the optimization of culture medium for mammalian cells.采用主动学习优化哺乳动物细胞的培养基。

NPJ Syst Biol Appl. 2023 May 30;9(1):20. doi: 10.1038/s41540-023-00284-7.

A review of algorithmic approaches for cell culture media optimization.细胞培养基优化算法方法综述。

Front Bioeng Biotechnol. 2023 May 11;11:1195294. doi: 10.3389/fbioe.2023.1195294. eCollection 2023.

Optimized Data Set and Feature Construction for Substrate Prediction of Membrane Transporters.优化数据集和特征构建用于预测膜转运蛋白的底物。

J Chem Inf Model. 2022 Dec 12;62(23):6242-6257. doi: 10.1021/acs.jcim.2c00850. Epub 2022 Dec 1.

The dark proteome: translation from noncanonical open reading frames.暗蛋白质组学：从非规范开放阅读框的翻译。

Trends Cell Biol. 2022 Mar;32(3):243-258. doi: 10.1016/j.tcb.2021.10.010. Epub 2021 Nov 26.

Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。

Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.

ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans：通过自监督学习理解生命语言。

IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.

The Transporter Classification Database (TCDB): 2021 update.《转运蛋白分类数据库（TCDB）：2021 年更新》。

Nucleic Acids Res. 2021 Jan 8;49(D1):D461-D467. doi: 10.1093/nar/gkaa1004.

Transporter engineering in microbial cell factories: the ins, the outs, and the in-betweens.微生物细胞工厂中的载体工程：内、外、中。

Curr Opin Biotechnol. 2020 Dec;66:186-194. doi: 10.1016/j.copbio.2020.08.002. Epub 2020 Sep 12.

Mechanisms of Multidrug Resistance in Cancer Chemotherapy.癌症化疗中的多药耐药机制。

Int J Mol Sci. 2020 May 2;21(9):3233. doi: 10.3390/ijms21093233.

TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information.TranCEP：使用组成、进化和位置信息预测跨膜转运蛋白的底物类别。

PLoS One. 2020 Jan 14;15(1):e0227683. doi: 10.1371/journal.pone.0227683. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

确定膜转运蛋白的最佳底物类别。

Identifying optimal substrate classes of membrane transporters.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献