• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用OMArk对基因库注释进行质量评估。

Quality assessment of gene repertoire annotations with OMArk.

作者信息

Nevers Yannis, Warwick Vesztrocy Alex, Rossier Victor, Train Clément-Marie, Altenhoff Adrian, Dessimoz Christophe, Glover Natasha M

机构信息

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

Swiss Institute of Bioinformatics, Lausanne, Switzerland.

出版信息

Nat Biotechnol. 2025 Jan;43(1):124-133. doi: 10.1038/s41587-024-02147-w. Epub 2024 Feb 21.

DOI:10.1038/s41587-024-02147-w
PMID:38383603
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11738984/
Abstract

In the era of biodiversity genomics, it is crucial to ensure that annotations of protein-coding gene repertoires are accurate. State-of-the-art tools to assess genome annotations measure the completeness of a gene repertoire but are blind to other errors, such as gene overprediction or contamination. We introduce OMArk, a software package that relies on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness but also the consistency of the gene repertoire as a whole relative to closely related species and reports likely contamination events. Analysis of 1,805 UniProt Eukaryotic Reference Proteomes with OMArk demonstrated strong evidence of contamination in 73 proteomes and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as a reference. This study illustrates the importance of comparing and prioritizing proteomes based on their quality measures.

摘要

在生物多样性基因组学时代,确保蛋白质编码基因库注释的准确性至关重要。评估基因组注释的先进工具可衡量基因库的完整性,但对其他错误(如基因过度预测或污染)视而不见。我们引入了OMArk,这是一个软件包,它依赖于查询蛋白质组与生命之树中预先计算的基因家族之间快速、无需比对的序列比较。OMArk不仅评估基因库的完整性,还评估整个基因库相对于近缘物种的一致性,并报告可能的污染事件。使用OMArk对1805个UniProt真核生物参考蛋白质组进行分析,结果表明73个蛋白质组存在污染的有力证据,并确定了由于使用碎片化的斑胸草雀蛋白质组作为参考而导致的鸟类基因注释中的错误传播。这项研究说明了根据蛋白质组的质量指标进行比较和排序的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/76fa711a46d8/41587_2024_2147_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/6b837bdfca4d/41587_2024_2147_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/bb0cf731cad9/41587_2024_2147_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/1d74a69a5c08/41587_2024_2147_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/de5fa252a25c/41587_2024_2147_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/e1a23c46beff/41587_2024_2147_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/76fa711a46d8/41587_2024_2147_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/6b837bdfca4d/41587_2024_2147_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/bb0cf731cad9/41587_2024_2147_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/1d74a69a5c08/41587_2024_2147_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/de5fa252a25c/41587_2024_2147_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/e1a23c46beff/41587_2024_2147_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c21c/11738984/76fa711a46d8/41587_2024_2147_Fig6_HTML.jpg

相似文献

1
Quality assessment of gene repertoire annotations with OMArk.使用OMArk对基因库注释进行质量评估。
Nat Biotechnol. 2025 Jan;43(1):124-133. doi: 10.1038/s41587-024-02147-w. Epub 2024 Feb 21.
2
The different proteomes of Chlamydomonas reinhardtii.莱茵衣藻的不同蛋白质组。
J Proteomics. 2012 Oct 22;75(18):5883-7. doi: 10.1016/j.jprot.2012.07.045. Epub 2012 Aug 7.
3
HAMAP as SPARQL rules-A portable annotation pipeline for genomes and proteomes.HAMAP 作为 SPARQL 规则——一种用于基因组和蛋白质组的可移植注释管道。
Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa003.
4
Saccharomyces cerevisiae: gene annotation and genome variability, state of the art through comparative genomics.酿酒酵母:基因注释与基因组变异性,通过比较基因组学呈现的最新技术水平
Methods Mol Biol. 2011;759:31-40. doi: 10.1007/978-1-61779-173-4_2.
5
MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes.微生物注释器:一个用户友好、全面的微生物基因组功能注释管道。
BMC Bioinformatics. 2021 Jan 6;22(1):11. doi: 10.1186/s12859-020-03940-5.
6
UniProt: a worldwide hub of protein knowledge.UniProt:蛋白质知识的全球枢纽。
Nucleic Acids Res. 2019 Jan 8;47(D1):D506-D515. doi: 10.1093/nar/gky1049.
7
Comparative Genome Annotation.比较基因组注释。
Methods Mol Biol. 2024;2802:165-187. doi: 10.1007/978-1-0716-3838-5_7.
8
ParsEval: parallel comparison and analysis of gene structure annotations.ParsEval:基因结构注释的并行比较和分析。
BMC Bioinformatics. 2012 Aug 1;13:187. doi: 10.1186/1471-2105-13-187.
9
OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations.OrthoFiller:利用多个物种的数据提高基因组注释的完整性。
BMC Genomics. 2017 May 18;18(1):390. doi: 10.1186/s12864-017-3771-x.
10
Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.基于系统发生的基因本体论联盟功能注释传播。
Brief Bioinform. 2011 Sep;12(5):449-62. doi: 10.1093/bib/bbr042. Epub 2011 Aug 27.

引用本文的文献

1
Chromosome-level genome assembly of the Tyrrhenian tree frog (Hyla sarda).第勒尼安树蛙(Hyla sarda)的染色体水平基因组组装
Sci Data. 2025 Sep 2;12(1):1539. doi: 10.1038/s41597-025-05760-9.
2
Better together: Subgenomes for allotetraploid potato wild relative Solanum acaule Bitt. reveal origins in Petota Clade 3 and 4.携手共进:异源四倍体马铃薯野生近缘种智利茄的亚基因组揭示其起源于马铃薯进化分支3和4。
Plant Genome. 2025 Sep;18(3):e70095. doi: 10.1002/tpg2.70095.
3
Tracing the stepwise Darwinian evolution of a plant halogenase.追踪植物卤化酶的逐步达尔文进化过程。

本文引用的文献

1
OMA orthology in 2024: improved prokaryote coverage, ancestral and extant GO enrichment, a revamped synteny viewer and more in the OMA Ecosystem.2024 年的 OMA 同源物:改进的原核生物覆盖范围、祖先和现存 GO 富集、重新设计的同线性视图以及更多的 OMA 生态系统。
Nucleic Acids Res. 2024 Jan 5;52(D1):D513-D521. doi: 10.1093/nar/gkad1020.
2
Protein length distribution is remarkably uniform across the tree of life.蛋白质长度分布在整个生命之树上都非常均匀。
Genome Biol. 2023 Jun 8;24(1):135. doi: 10.1186/s13059-023-02973-2.
3
Contamination detection in genomic data: more is not enough.
Sci Adv. 2025 Aug 15;11(33):eadv6898. doi: 10.1126/sciadv.adv6898. Epub 2025 Aug 13.
4
Benchmarking of bioinformatics tools for the hybrid assembly of human and non-human whole-genome sequencing data.用于人类和非人类全基因组测序数据混合组装的生物信息学工具的基准测试。
Comput Struct Biotechnol J. 2025 Jul 13;27:3099-3109. doi: 10.1016/j.csbj.2025.07.020. eCollection 2025.
5
A telomere-to-telomere reference genome assembly of the red silk cotton tree (Bombax ceiba).木棉(Bombax ceiba)的端粒到端粒参考基因组组装
Sci Data. 2025 Jul 16;12(1):1250. doi: 10.1038/s41597-025-05606-4.
6
Chimeric mis-annotations of genes remain pervasive in eukaryotic non-model organisms.基因的嵌合错误注释在真核非模式生物中仍然普遍存在。
BMC Genomics. 2025 Jul 1;26(1):630. doi: 10.1186/s12864-025-11765-w.
7
Nuclear genome assembly of Leucinodes orbonalis (Lepidoptera: Crambidae) collected from the Philippines.从菲律宾采集的棉铃虫(鳞翅目:草螟科)的核基因组组装
J Insect Sci. 2025 May 9;25(3). doi: 10.1093/jisesa/ieaf066.
8
Chromosome-Contiguous Reference Genome from a Single Archived Specimen Elucidates Human Hookworm Biology and Host-Parasite Interactions.来自单一存档样本的染色体连续参考基因组阐明了人类钩虫生物学及宿主-寄生虫相互作用。
Int J Mol Sci. 2025 Jun 11;26(12):5576. doi: 10.3390/ijms26125576.
9
Annotation matters: the effect of structural gene annotation on orthology inference.注释很重要:结构基因注释对直系同源推断的影响。
Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf365.
10
The chromosomal genome sequence of the kidney sponge, Nardo, 1847, and its associated microbial metagenome sequences.1847年纳尔多所描述的肾海绵的染色体基因组序列及其相关微生物宏基因组序列。
Wellcome Open Res. 2025 May 29;10:283. doi: 10.12688/wellcomeopenres.24166.1. eCollection 2025.
基因组数据中的污染检测:更多并不一定更好。
Genome Biol. 2022 Feb 21;23(1):60. doi: 10.1186/s13059-022-02619-9.
4
Standards recommendations for the Earth BioGenome Project.地球生物基因组计划标准建议。
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4). doi: 10.1073/pnas.2115639118.
5
Why sequence all eukaryotes?为什么要对所有真核生物进行测序?
Proc Natl Acad Sci U S A. 2022 Jan 25;119(4). doi: 10.1073/pnas.2115636118.
6
Ensembl Genomes 2022: an expanding genome resource for non-vertebrates.Ensembl Genomes 2022:一个不断扩展的非脊椎动物基因组资源。
Nucleic Acids Res. 2022 Jan 7;50(D1):D996-D1003. doi: 10.1093/nar/gkab1007.
7
Ensembl 2022.Ensembl 2022.
Nucleic Acids Res. 2022 Jan 7;50(D1):D988-D995. doi: 10.1093/nar/gkab1049.
8
BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes.BUSCO 更新:用于真核生物、原核生物和病毒基因组评分的新颖且简化的工作流程以及更广泛和更深的系统发育覆盖范围。
Mol Biol Evol. 2021 Sep 27;38(10):4647-4654. doi: 10.1093/molbev/msab199.
9
Sustainable data analysis with Snakemake.使用 Snakemake 进行可持续数据分析。
F1000Res. 2021 Jan 18;10:33. doi: 10.12688/f1000research.29032.2. eCollection 2021.
10
OMAmer: tree-driven and alignment-free protein assignment to subfamilies outperforms closest sequence approaches.OMAmer:基于树的、无需比对的蛋白质亚家族分配方法优于最接近序列的方法。
Bioinformatics. 2021 Sep 29;37(18):2866-2873. doi: 10.1093/bioinformatics/btab219.