Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08873, USA.
Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ 08901, USA.
Bioinformatics. 2022 Jul 11;38(14):3532-3540. doi: 10.1093/bioinformatics/btac358.
metal-binding proteins have a central role in maintaining life processes. Nearly one-third of known protein structures contain metal ions that are used for a variety of needs, such as catalysis, DNA/RNA binding, protein structure stability, etc. Identifying metal-binding proteins is thus crucial for understanding the mechanisms of cellular activity. However, experimental annotation of protein metal-binding potential is severely lacking, while computational techniques are often imprecise and of limited applicability.
we developed a novel machine learning-based method, mebipred, for identifying metal-binding proteins from sequence-derived features. This method is over 80% accurate in recognizing proteins that bind metal ion-containing ligands; the specific identity of 11 ubiquitously present metal ions can also be annotated. mebipred is reference-free, i.e. no sequence alignments are involved, and is thus faster than alignment-based methods; it is also more accurate than other sequence-based prediction methods. Additionally, mebipred can identify protein metal-binding capabilities from short sequence stretches, e.g. translated sequencing reads, and, thus, may be useful for the annotation of metal requirements of metagenomic samples. We performed an analysis of available microbiome data and found that ocean, hot spring sediments and soil microbiomes use a more diverse set of metals than human host-related ones. For human microbiomes, physiological conditions explain the observed metal preferences. Similarly, subtle changes in ocean sample ion concentration affect the abundance of relevant metal-binding proteins. These results highlight mebipred's utility in analyzing microbiome metal requirements.
mebipred is available as a web server at services.bromberglab.org/mebipred and as a standalone package at https://pypi.org/project/mymetal/.
Supplementary data are available at Bioinformatics online.
金属结合蛋白在维持生命过程中起着核心作用。已知的蛋白质结构中,近三分之一包含用于各种需求的金属离子,例如催化、DNA/RNA 结合、蛋白质结构稳定性等。因此,识别金属结合蛋白对于理解细胞活动的机制至关重要。然而,蛋白质金属结合潜力的实验注释严重缺乏,而计算技术往往不够精确且适用性有限。
我们开发了一种新的基于机器学习的方法 mebipred,用于从序列衍生特征中识别金属结合蛋白。该方法在识别结合金属离子配体的蛋白质方面的准确率超过 80%;还可以注释 11 种普遍存在的金属离子的特定身份。mepbipred 是无参考的,即不涉及序列比对,因此比基于比对的方法更快;它也比其他基于序列的预测方法更准确。此外,mepbipred 可以从短序列片段(例如翻译后的测序读取)中识别蛋白质的金属结合能力,因此可能对宏基因组样本的金属需求注释有用。我们对可用的微生物组数据进行了分析,发现海洋、温泉沉积物和土壤微生物组使用的金属种类比与人类宿主相关的微生物组更多。对于人类微生物组,生理条件解释了观察到的金属偏好。类似地,海洋样本离子浓度的细微变化会影响相关金属结合蛋白的丰度。这些结果突出了 mebipred 在分析微生物组金属需求方面的实用性。
mepbipred 可作为网络服务器在 services.bromberglab.org/mebipred 上使用,也可作为独立软件包在 https://pypi.org/project/mymetal/ 上使用。
补充数据可在生物信息学在线获得。