基于同源性和非同源性的计算方法在孤儿酶的鉴定和注释中的应用：以结核分枝杆菌 H37Rv 为例。

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.

机构信息

Bioinformatics Institute, Agency for Science, Technology, and Research (A*Star), 30 Biopolis Street, #07-01 Matrix, Singapore, 138671, Republic of Singapore.

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India.

出版信息

BMC Bioinformatics. 2020 Oct 19;21(1):466. doi: 10.1186/s12859-020-03794-x.

DOI:10.1186/s12859-020-03794-x

PMID:33076816

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7574302/

Abstract

BACKGROUND

Homology based methods are one of the most important and widely used approaches for functional annotation of high-throughput microbial genome data. A major limitation of these methods is the absence of well-characterized sequences for certain functions. The non-homology methods based on the context and the interactions of a protein are very useful for identifying missing metabolic activities and functional annotation in the absence of significant sequence similarity. In the current work, we employ both homology and context-based methods, incrementally, to identify local holes and chokepoints, whose presence in the Mycobacterium tuberculosis genome is indicated based on its interaction with known proteins in a metabolic network context, but have not been annotated. We have developed two computational procedures using network theory to identify orphan enzymes ('Hole finding protocol') coupled with the identification of candidate proteins for the predicted orphan enzyme ('Hole filling protocol'). We propose an integrated interaction score based on scores from the STRING database to identify candidate protein sequences for the orphan enzymes from M. tuberculosis, as a case study, which are most likely to perform the missing function.

RESULTS

The application of an automated homology-based enzyme identification protocol, ModEnzA, on M. tuberculosis genome yielded 56 novel enzyme predictions. We further predicted 74 putative local holes, 6 choke points, and 3 high confidence local holes in the genome using 'Hole finding protocol'. The 'Hole-filling protocol' was validated on the E. coli genome using artificial in-silico enzyme knockouts where our method showed 25% increased accuracy, compared to other methods, in assigning the correct sequence for the knocked-out enzyme amongst the top 10 ranks. The method was further validated on 8 additional genomes.

CONCLUSIONS

We have developed methods that can be generalized to augment homology-based annotation to identify missing enzyme coding genes and to predict a candidate protein for them. For pathogens such as M. tuberculosis, this work holds significance in terms of increasing the protein repertoire and thereby, the potential for identifying novel drug targets.

摘要

背景

基于同源性的方法是对高通量微生物基因组数据进行功能注释的最重要和最广泛使用的方法之一。这些方法的一个主要局限性是某些功能缺乏特征良好的序列。基于蛋白质的上下文和相互作用的非同源性方法对于在没有显著序列相似性的情况下识别缺失的代谢活性和功能注释非常有用。在当前的工作中，我们逐步采用同源性和基于上下文的方法来识别局部漏洞和瓶颈，根据其在代谢网络上下文中与已知蛋白质的相互作用，这些漏洞和瓶颈在结核分枝杆菌基因组中存在，但尚未被注释。我们使用网络理论开发了两种计算程序来识别孤儿酶（“发现漏洞协议”），并结合预测的孤儿酶候选蛋白（“填补漏洞协议”）。我们提出了一种基于 STRING 数据库得分的综合相互作用得分，以识别结核分枝杆菌孤儿酶的候选蛋白序列，作为一个案例研究，这些候选蛋白序列最有可能执行缺失的功能。

结果

应用自动化基于同源性的酶识别协议 ModEnzA 对结核分枝杆菌基因组进行分析，得到了 56 个新的酶预测。我们进一步使用“发现漏洞协议”预测了基因组中 74 个可能的局部漏洞、6 个瓶颈和 3 个高置信度局部漏洞。“填补漏洞协议”在大肠杆菌基因组上进行了验证，使用人工在模拟酶敲除中，与其他方法相比，我们的方法在将敲除酶的正确序列分配给前 10 个排名中的正确序列时，准确性提高了 25%。该方法进一步在 8 个额外的基因组上进行了验证。

结论

我们开发的方法可以推广到基于同源性的注释，以识别缺失的酶编码基因并预测它们的候选蛋白。对于结核分枝杆菌等病原体，这项工作在增加蛋白质组学和从而识别新的药物靶标方面具有重要意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7986/7574302/b26073eff7d2/12859_2020_3794_Fig1_HTML.jpg

相似文献

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.基于同源性和非同源性的计算方法在孤儿酶的鉴定和注释中的应用：以结核分枝杆菌 H37Rv 为例。

BMC Bioinformatics. 2020 Oct 19;21(1):466. doi: 10.1186/s12859-020-03794-x.

A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases.一种用于在预测的代谢途径数据库中识别缺失酶的贝叶斯方法。

BMC Bioinformatics. 2004 Jun 9;5:76. doi: 10.1186/1471-2105-5-76.

Functional annotation of putative aminoglycoside antibiotic modifying proteins in Mycobacterium tuberculosis H37Rv.结核分枝杆菌H37Rv中推定的氨基糖苷类抗生素修饰蛋白的功能注释

J Antibiot (Tokyo). 2003 Feb;56(2):135-42. doi: 10.7164/antibiotics.56.135.

Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coli genomes using data mining.利用数据挖掘技术从结核分枝杆菌和大肠杆菌基因组序列中准确预测蛋白质功能类别。

Yeast. 2000 Dec;17(4):283-93. doi: 10.1002/1097-0061(200012)17:4<283::AID-YEA52>3.0.CO;2-F.

Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv.结核分枝杆菌H37Rv基因组序列的重新注释

Microbiology (Reading). 2002 Oct;148(Pt 10):2967-2973. doi: 10.1099/00221287-148-10-2967.

Functional assignment of Mycobacterium tuberculosis proteome revealed by genome-scale fold-recognition.通过全基因组尺度折叠识别揭示结核分枝杆菌蛋白质组的功能分配

Tuberculosis (Edinb). 2013 Jan;93(1):40-6. doi: 10.1016/j.tube.2012.11.008. Epub 2013 Jan 1.

Structural annotation of Mycobacterium tuberculosis proteome.结核分枝杆菌蛋白质组的结构注释。

PLoS One. 2011;6(10):e27044. doi: 10.1371/journal.pone.0027044. Epub 2011 Oct 31.

Benchmarking PSI-BLAST in genome annotation.在基因组注释中对PSI-BLAST进行基准测试。

J Mol Biol. 1999 Nov 12;293(5):1257-71. doi: 10.1006/jmbi.1999.3233.

Genome Wide Re-Annotation of Caldicellulosiruptor saccharolyticus with New Insights into Genes Involved in Biomass Degradation and Hydrogen Production.嗜热栖热放线菌的全基因组重新注释及对参与生物质降解和产氢基因的新见解

PLoS One. 2015 Jul 21;10(7):e0133183. doi: 10.1371/journal.pone.0133183. eCollection 2015.

引用本文的文献

Functional prediction of proteins from the human gut archaeome.来自人类肠道古菌组的蛋白质功能预测

ISME Commun. 2024 Jan 10;4(1):ycad014. doi: 10.1093/ismeco/ycad014. eCollection 2024 Jan.

An informatic workflow for the enhanced annotation of excretory/secretory proteins of .一种用于增强对……的排泄/分泌蛋白注释的信息学工作流程。（原文中“of”后面似乎缺失了具体内容）

Comput Struct Biotechnol J. 2023 Mar 18;21:2696-2704. doi: 10.1016/j.csbj.2023.03.025. eCollection 2023.

Molecular Insight into Resistance to Nitrofuranyl Amides Gained through Metagenomics-like Analysis of Spontaneous Mutants.通过对自发突变体进行宏基因组学样分析获得的对硝基呋喃酰胺耐药性的分子洞察

Pharmaceuticals (Basel). 2022 Sep 12;15(9):1136. doi: 10.3390/ph15091136.

Identification of a novel gene required for competitive growth at high temperature in the thermotolerant yeast .鉴定一种新型基因，该基因对于耐热酵母在高温下的竞争生长是必需的。

Microbiology (Reading). 2022 Mar;168(3). doi: 10.1099/mic.0.001148.

本文引用的文献

Purifying Selective Pressure Suggests the Functionality of a Vitamin B12 Biosynthesis Pathway in a Global Population of Mycobacterium tuberculosis.纯化选择压力表明结核分枝杆菌全球种群中维生素 B12 生物合成途径的功能。

Genome Biol Evol. 2018 Sep 1;10(9):2326-2337. doi: 10.1093/gbe/evy153.

Network analysis reveals potential markers for pediatric adrenocortical carcinoma.网络分析揭示了小儿肾上腺皮质癌的潜在标志物。

Onco Targets Ther. 2016 Jul 26;9:4569-81. doi: 10.2147/OTT.S108485. eCollection 2016.

Smooth Tubercle Bacilli: Neglected Opportunistic Tropical Pathogens.光滑结核分枝杆菌：被忽视的热带机会性病原体。

Front Public Health. 2016 Jan 11;3:283. doi: 10.3389/fpubh.2015.00283. eCollection 2015.

STRING v10: protein-protein interaction networks, integrated over the tree of life.STRING v10：整合了整个生命之树的蛋白质-蛋白质相互作用网络。

Nucleic Acids Res. 2015 Jan;43(Database issue):D447-52. doi: 10.1093/nar/gku1003. Epub 2014 Oct 28.

HMM-ModE: implementation, benchmarking and validation with HMMER3.HMM-ModE：使用HMMER3进行实现、基准测试和验证。

BMC Res Notes. 2014 Jul 30;7:483. doi: 10.1186/1756-0500-7-483.

Profiling the orphan enzymes.鉴定孤儿酶。

Biol Direct. 2014 Jun 6;9:10. doi: 10.1186/1745-6150-9-10.

Finding sequences for over 270 orphan enzymes.找到超过270种孤儿酶的序列。

PLoS One. 2014 May 14;9(5):e97250. doi: 10.1371/journal.pone.0097250. eCollection 2014.

eggNOG v4.0: nested orthology inference across 3686 organisms.eggNOG v4.0：跨越 3686 个生物体的嵌套同源推断。

Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9. doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.

Pfam: the protein families database.Pfam：蛋白质家族数据库。

Nucleic Acids Res. 2014 Jan;42(Database issue):D222-30. doi: 10.1093/nar/gkt1223. Epub 2013 Nov 27.

Fragment recruitment on metabolic pathways: comparative metabolic profiling of metagenomes and metatranscriptomes.代谢途径的片段招募：宏基因组和宏转录组的比较代谢组学分析。

Bioinformatics. 2013 Mar 15;29(6):790-1. doi: 10.1093/bioinformatics/bts721. Epub 2013 Jan 9.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于同源性和非同源性的计算方法在孤儿酶的鉴定和注释中的应用：以结核分枝杆菌 H37Rv 为例。

Implementation of homology based and non-homology based computational methods for the identification and annotation of orphan enzymes: using Mycobacterium tuberculosis H37Rv as a case study.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献