• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用序列相似性工具重新审视锥虫属的功能注释。

Revisiting the functional annotation of TriTryp using sequence similarity tools.

作者信息

Borujeni Poorya Mirzavand, Salavati Reza

机构信息

Institute of Parasitology, McGill University, Canada.

Department of Biochemistry, McGill University, Canada.

出版信息

Heliyon. 2024 Oct 11;10(20):e39243. doi: 10.1016/j.heliyon.2024.e39243. eCollection 2024 Oct 30.

DOI:10.1016/j.heliyon.2024.e39243
PMID:39640808
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11620254/
Abstract

Trypanosomatids are the causative agents of deadly diseases in humans and livestock. Given the high phylogenetic distance of trypanosomatids from model organisms, these organisms have ample unannotated genes. Manual functional annotation is time-consuming, highlighting the importance of automated functional annotation tools. The development of automated functional tools is a hot research topic, and multiple tools have been developed for the task. PANNZER2 is an automated functional annotation tool that merely relies on the sequence similarity of the query to the annotated proteins. We tried PANNZER2 on , the most studied organism among trypanosomatids, to see if it could improve our knowledge of the functions of the genes. Even with the availability of automated annotation tools like InterPro2GO in databases such as TriTrypDB, PANNZER2 has made surprisingly confident predictions for some hypothetical proteins in . In this study, we identify gaps in such annotations because of not employing pairwise sequence alignment tools in TriTrypDB's automated annotation process. Our findings demonstrate that even the use of stringent cutoffs can successfully annotate a significant number of proteins. Additionally, we discovered that adjusting the open reading frames in certain genes leads to sequences with increased sequence signature coverage-characterized by the length covered by at least one sequence signature-compared to the original sequences. This enhanced sequence signature coverage suggests these genomic fragments could be pseudogenes. To facilitate further exploration, we developed a script to help identify potential pseudogenes within an organism's genome, offering researchers a new tool for genomic analysis and understanding. We extended all our analysis to and to assess the impact of this approach across different species. Our study demonstrates that by utilizing pairwise sequence similarity alignment, even with stringent cutoffs, we can attribute 2986, 3953, and 3798 new GO terms to the genomes of , , and . Additionally, we found that 210, 239, and 29 genes exhibit increased sequence signature coverage following frame correction, suggesting the presence of pseudogenes.

摘要

锥虫是人类和牲畜致命疾病的病原体。鉴于锥虫与模式生物在系统发育上的距离较远,这些生物有大量未注释的基因。手动功能注释耗时费力,凸显了自动化功能注释工具的重要性。自动化功能工具的开发是一个热门研究课题,已经开发了多种工具来完成这项任务。PANNZER2是一种自动化功能注释工具,它仅依赖于查询序列与已注释蛋白质的序列相似性。我们在锥虫中研究最多的生物体上试用了PANNZER2,以了解它是否能增进我们对基因功能的认识。即使在TriTrypDB等数据库中已有InterPro2GO等自动化注释工具,PANNZER2对某些锥虫中的假设蛋白质也做出了惊人准确的预测。在本研究中,我们发现由于TriTrypDB的自动化注释过程中未使用成对序列比对工具,此类注释存在空白。我们的研究结果表明,即使使用严格的阈值也能成功注释大量蛋白质。此外,我们发现调整某些基因的开放阅读框会导致序列的序列特征覆盖率增加,与原始序列相比,其特征在于至少一个序列特征覆盖的长度。这种增强的序列特征覆盖率表明这些基因组片段可能是假基因。为便于进一步探索,我们开发了一个脚本,以帮助识别生物体基因组中的潜在假基因,为研究人员提供了一种新的基因组分析和理解工具。我们将所有分析扩展到其他物种,以评估这种方法对不同物种的影响。我们的研究表明,通过利用成对序列相似性比对,即使使用严格的阈值,我们也可以为其他物种的基因组分别赋予2986、3953和3798个新的基因本体术语。此外,我们发现分别有210、239和29个基因在框架校正后序列特征覆盖率增加,表明存在假基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/b9110d97bae3/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/2109a81fb0f2/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/1af723291d41/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/98adae164cd7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/b9110d97bae3/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/2109a81fb0f2/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/1af723291d41/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/98adae164cd7/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50c5/11620254/b9110d97bae3/gr4.jpg

相似文献

1
Revisiting the functional annotation of TriTryp using sequence similarity tools.使用序列相似性工具重新审视锥虫属的功能注释。
Heliyon. 2024 Oct 11;10(20):e39243. doi: 10.1016/j.heliyon.2024.e39243. eCollection 2024 Oct 30.
2
Uncovering Pseudogenes and Intergenic Protein-coding Sequences in TriTryps' Genomes.揭示三型生物基因组中的假基因和基因间蛋白编码序列。
Genome Biol Evol. 2022 Oct 7;14(10). doi: 10.1093/gbe/evac142.
3
Gene Ontology annotation of the rice blast fungus, Magnaporthe oryzae.稻瘟病菌(Magnaporthe oryzae)的基因本体注释
BMC Microbiol. 2009 Feb 19;9 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2180-9-S1-S8.
4
Comparative analysis of the kinomes of three pathogenic trypanosomatids: Leishmania major, Trypanosoma brucei and Trypanosoma cruzi.三种致病性锥虫(硕大利什曼原虫、布氏锥虫和克氏锥虫)激酶组的比较分析。
BMC Genomics. 2005 Sep 15;6:127. doi: 10.1186/1471-2164-6-127.
5
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
6
Functional genome annotation by combined analysis across microarray studies of Trypanosoma brucei.通过对布鲁氏锥虫的微阵列研究进行综合分析来进行功能基因组注释。
PLoS Negl Trop Dis. 2010 Aug 31;4(8):e810. doi: 10.1371/journal.pntd.0000810.
7
Functional domain annotation by structural similarity.基于结构相似性的功能域注释
NAR Genom Bioinform. 2024 Jan 31;6(1):lqae005. doi: 10.1093/nargab/lqae005. eCollection 2024 Mar.
8
Comprehensive Functional Annotation of Metagenomes and Microbial Genomes Using a Deep Learning-Based Method.基于深度学习的宏基因组和微生物组综合功能注释。
mSystems. 2023 Apr 27;8(2):e0117822. doi: 10.1128/msystems.01178-22. Epub 2023 Mar 7.
9
PANNZER2: a rapid functional annotation web server.PANNZER2:一个快速的功能注释网络服务器。
Nucleic Acids Res. 2018 Jul 2;46(W1):W84-W88. doi: 10.1093/nar/gky350.
10
Comparative omics-driven genome annotation refinement: application across Yersiniae.比较组学驱动的基因组注释精细化:在耶尔森氏菌中的应用。
PLoS One. 2012;7(3):e33903. doi: 10.1371/journal.pone.0033903. Epub 2012 Mar 27.

本文引用的文献

1
Muscle5: High-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny.肌肉 5:高精度比对集合可实现序列同源性和系统发育的无偏评估。
Nat Commun. 2022 Nov 15;13(1):6968. doi: 10.1038/s41467-022-34630-w.
2
InterPro in 2022.InterPro 在 2022 年。
Nucleic Acids Res. 2023 Jan 6;51(D1):D418-D427. doi: 10.1093/nar/gkac993.
3
Pseudofinder: Detection of Pseudogenes in Prokaryotic Genomes.伪基因查找器:原核基因组中伪基因的检测。
Mol Biol Evol. 2022 Jul 2;39(7). doi: 10.1093/molbev/msac153.
4
VEuPathDB: the eukaryotic pathogen, vector and host bioinformatics resource center.VEuPathDB:真核病原体、载体和宿主生物信息学资源中心。
Nucleic Acids Res. 2022 Jan 7;50(D1):D898-D911. doi: 10.1093/nar/gkab929.
5
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
6
A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain.生物医学领域基于知识图的相似度的基准数据集集合。
Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa078.
7
Pfam: The protein families database in 2021.Pfam:2021 年的蛋白质家族数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D412-D419. doi: 10.1093/nar/gkaa913.
8
Genomic Organization and Generation of Genetic Variability in the RHS (Retrotransposon Hot Spot) Protein Multigene Family in .[具体物种名称]中RHS(逆转录转座子热点)蛋白多基因家族的基因组组织与遗传变异性的产生
Genes (Basel). 2020 Sep 17;11(9):1085. doi: 10.3390/genes11091085.
9
Lexis and Grammar of Mitochondrial RNA Processing in Trypanosomes.线粒体 RNA 加工的词法和语法在锥虫中。
Trends Parasitol. 2020 Apr;36(4):337-355. doi: 10.1016/j.pt.2020.01.006. Epub 2020 Feb 28.
10
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.CAFA 挑战赛报告称,通过实验筛选,提高了数百个基因的蛋白质功能预测和新的功能注释。
Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.