生物信息学分析鉴定秀丽隐杆线虫中新型 OB 折叠蛋白编码基因。

Bioinformatics analysis identify novel OB fold protein coding genes in C. elegans.

机构信息

Molecular Biology and Biochemistry Department, Simon Fraser University, Burnaby, British Columbia, Canada.

出版信息

PLoS One. 2013 Apr 25;8(4):e62204. doi: 10.1371/journal.pone.0062204. Print 2013.

DOI:10.1371/journal.pone.0062204

PMID:23638006

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3636199/

Abstract

BACKGROUND

The C. elegans genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes in silico in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB) fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5-25%). Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In C. elegans, the number of OB-fold proteins reported is remarkably low (n=46) compared to other evolutionary-related eukaryotes, such as yeast S. cerevisiae (n=344) or fruit fly D. melanogaster (n=84). Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies.

METHODOLOGY/PRINCIPAL FINDINGS: This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in C. elegans.

CONCLUSIONS/SIGNIFICANCE: This study raises the possibility that the annotation of highly divergent protein fold families can be improved in C. elegans. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of C. elegans, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome.

摘要

背景

WormBase 联盟通过使用最先进的生物信息学管道、功能基因组学和人工策展方法，对秀丽隐杆线虫的基因组进行了广泛注释。因此，在这个模式生物中，通过计算机从基因序列中识别新基因变得更具挑战性，需要新的方法。寡核苷酸-寡糖结合（OB）折叠是一个高度多样化的蛋白质家族，尽管具有相同的折叠结构，但蛋白质序列的序列同一性非常低（5-25%）。因此，基于序列的注释证据可能不足以识别这个家族的所有成员。在秀丽隐杆线虫中，与其他进化相关的真核生物（如酿酒酵母 S. cerevisiae 有 344 个，或黑腹果蝇 D. melanogaster 有 84 个）相比，报道的 OB 折叠蛋白数量显著较少（n=46）。这种蛋白家族的基因丢失或注释水平的差异，可能解释了这些差异。

方法/主要发现：本研究探讨了线虫中是否存在新的 OB 折叠编码基因的可能性。我们开发了一种生物信息学方法，该方法使用最敏感的序列-序列、序列-模式和模式-模式相似性搜索方法，然后进行 3D 结构预测作为过滤步骤，以消除假阳性候选序列。我们预测了 18 个含有 OB 折叠的编码基因，这些基因在秀丽隐杆线虫中部分得到了很好的表征。

结论/意义：本研究提出了一种可能性，即在秀丽隐杆线虫中，可以改进高度多样化的蛋白质折叠家族的注释。当秀丽隐杆线虫或其他进化相关物种的基因组序列的新版本发布时，WormBase 联盟可以采用类似的策略进行大规模分析。这种方法对科学界具有普遍意义，因为它可以用于注释任何基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f1b6/3636199/a0881b67acea/pone.0062204.g001.jpg

相似文献

Bioinformatics analysis identify novel OB fold protein coding genes in C. elegans.生物信息学分析鉴定秀丽隐杆线虫中新型 OB 折叠蛋白编码基因。

PLoS One. 2013 Apr 25;8(4):e62204. doi: 10.1371/journal.pone.0062204. Print 2013.

WormBase: a comprehensive data resource for Caenorhabditis biology and genomics.WormBase：秀丽隐杆线虫生物学与基因组学的综合数据资源。

Nucleic Acids Res. 2005 Jan 1;33(Database issue):D383-9. doi: 10.1093/nar/gki066.

NemaFootPrinter: a web based software for the identification of conserved non-coding genome sequence regions between C. elegans and C. briggsae.线虫足部打印机：一种基于网络的软件，用于识别秀丽隐杆线虫和briggsae线虫之间保守的非编码基因组序列区域。

BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S22. doi: 10.1186/1471-2105-6-S4-S22.

RUN-CBFbeta interaction in C. elegans: computational prediction and experimental verification.秀丽隐杆线虫中RUN与CBFβ的相互作用：计算预测与实验验证

J Biomol Struct Dyn. 2007 Feb;24(4):343-58. doi: 10.1080/07391102.2007.10507124.

Wide diversity in structure and expression profiles among members of the Caenorhabditis elegans globin protein family.秀丽隐杆线虫球蛋白蛋白家族成员之间在结构和表达谱上存在广泛差异。

BMC Genomics. 2007 Oct 4;8:356. doi: 10.1186/1471-2164-8-356.

BMC Syst Biol. 2008 Jul 31;2:69. doi: 10.1186/1752-0509-2-69.

WormBase as an integrated platform for the C. elegans ORFeome.作为秀丽隐杆线虫开放阅读框组的集成平台的WormBase。

Genome Res. 2004 Oct;14(10B):2155-61. doi: 10.1101/gr.2521304.

The evolutionary duplication and probable demise of an endodermal GATA factor in Caenorhabditis elegans.秀丽隐杆线虫中一种内胚层GATA因子的进化复制及可能的消亡

Genetics. 2003 Oct;165(2):575-88. doi: 10.1093/genetics/165.2.575.

Novel and improved Caenorhabditis briggsae gene models generated by community curation.通过社区管理生成的新型且改进的秀丽隐杆线虫基因模型。

BMC Genomics. 2023 Aug 25;24(1):486. doi: 10.1186/s12864-023-09582-0.

Novel and improved gene models generated by community curation.通过社区编辑生成的新型且经过改进的基因模型。

bioRxiv. 2023 May 18:2023.05.16.541014. doi: 10.1101/2023.05.16.541014.

引用本文的文献

Exploring the structural landscape of DNA maintenance proteins.探索 DNA 维持蛋白的结构全景。

Nat Commun. 2024 Sep 5;15(1):7748. doi: 10.1038/s41467-024-49983-7.

AnABlast: a new in silico strategy for the genome-wide search of novel genes and fossil regions.AnABlast：一种用于全基因组搜索新基因和化石区域的新计算机策略。

DNA Res. 2015 Dec;22(6):439-49. doi: 10.1093/dnares/dsv025. Epub 2015 Oct 21.

本文引用的文献

Reversible suppression of an essential gene in adult mice using transgenic RNA interference.利用转基因 RNA 干扰可逆性抑制成年小鼠的必需基因。

Proc Natl Acad Sci U S A. 2011 Apr 26;108(17):7113-8. doi: 10.1073/pnas.1104097108. Epub 2011 Apr 11.

Protein sequence comparison and fold recognition: progress and good-practice benchmarking.蛋白质序列比较和折叠识别：进展和良好实践基准测试。

Curr Opin Struct Biol. 2011 Jun;21(3):404-11. doi: 10.1016/j.sbi.2011.03.005. Epub 2011 Mar 31.

UniProt Knowledgebase: a hub of integrated protein data.UniProt 知识库：一个集成蛋白质数据的中心。

Database (Oxford). 2011 Mar 29;2011:bar009. doi: 10.1093/database/bar009. Print 2011.

DNA double-strand break repair in Caenorhabditis elegans.秀丽隐杆线虫中的DNA双链断裂修复

Chromosoma. 2011 Feb;120(1):1-21. doi: 10.1007/s00412-010-0296-3. Epub 2010 Nov 5.

I-TASSER: a unified platform for automated protein structure and function prediction.I-TASSER：一个用于自动化蛋白质结构和功能预测的统一平台。

Nat Protoc. 2010 Apr;5(4):725-38. doi: 10.1038/nprot.2010.5. Epub 2010 Mar 25.

Two modes of mitochondrial dysfunction lead independently to lifespan extension in Caenorhabditis elegans.两种模式的线粒体功能障碍均可独立导致秀丽隐杆线虫寿命延长。

Aging Cell. 2010 Jun;9(3):433-47. doi: 10.1111/j.1474-9726.2010.00571.x. Epub 2010 Mar 19.

The Argonaute CSR-1 and its 22G-RNA cofactors are required for holocentric chromosome segregation.AGO蛋白CSR-1及其22G-RNA辅助因子是全着丝粒染色体分离所必需的。

Cell. 2009 Oct 2;139(1):123-34. doi: 10.1016/j.cell.2009.09.014.

The MRT-1 nuclease is required for DNA crosslink repair and telomerase activity in vivo in Caenorhabditis elegans.在秀丽隐杆线虫体内，DNA交联修复和端粒酶活性需要MRT-1核酸酶。

EMBO J. 2009 Nov 18;28(22):3549-63. doi: 10.1038/emboj.2009.278. Epub 2009 Sep 24.

Multiple ERK substrates execute single biological processes in Caenorhabditis elegans germ-line development.多个细胞外信号调节激酶（ERK）底物在线虫生殖系发育过程中执行单一生物学过程。

Proc Natl Acad Sci U S A. 2009 Mar 24;106(12):4776-81. doi: 10.1073/pnas.0812285106. Epub 2009 Mar 5.

C. elegans: a model of Fanconi anemia and ICL repair.秀丽隐杆线虫：范可尼贫血与交联修复的模型

Mutat Res. 2009 Jul 31;668(1-2):103-16. doi: 10.1016/j.mrfmmm.2008.11.007. Epub 2008 Nov 19.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

生物信息学分析鉴定秀丽隐杆线虫中新型 OB 折叠蛋白编码基因。

Bioinformatics analysis identify novel OB fold protein coding genes in C. elegans.

机构信息

出版信息

BACKGROUND

背景

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献