Suppr超能文献

GeneMark-ETP 显著提高了大型真核基因组自动注释的准确性。

GeneMark-ETP significantly improves the accuracy of automatic annotation of large eukaryotic genomes.

机构信息

School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.

Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.

出版信息

Genome Res. 2024 Jun 25;34(5):757-768. doi: 10.1101/gr.278373.123.

Abstract

Large-scale genomic initiatives, such as the Earth BioGenome Project, require efficient methods for eukaryotic genome annotation. Here we present an automatic gene finder, GeneMark-ETP, integrating genomic-, transcriptomic-, and protein-derived evidence that has been developed with a focus on large plant and animal genomes. GeneMark-ETP first identifies genomic loci where extrinsic data are sufficient for making gene predictions with "high confidence." The genes situated in the genomic space between the high-confidence genes are predicted in the next stage. The set of high-confidence genes serves as an initial training set for the statistical model. Further on, the model parameters are iteratively updated in the rounds of gene prediction and parameter re-estimation. Upon reaching convergence, GeneMark-ETP makes the final predictions and delivers the whole complement of predicted genes. GeneMark-ETP outperforms gene finders using a single type of extrinsic evidence. Comparisons with gene finders MAKER2 and TSEBRA, those that use both transcript- and protein-derived extrinsic evidence, show that GeneMark-ETP delivers state-of-the-art gene-prediction accuracy, with the margin of outperforming existing approaches increasing in its application to larger and more complex eukaryotic genomes.

摘要

大规模基因组计划,如地球生物基因组计划,需要高效的真核生物基因组注释方法。本文介绍了一种自动基因预测器 GeneMark-ETP,它整合了基于基因组、转录组和蛋白质的证据,专注于大型植物和动物基因组开发。GeneMark-ETP 首先识别出外显子数据足以进行“高可信度”基因预测的基因组区域。在接下来的阶段预测位于高可信度基因之间基因组空间的基因。高可信度基因集作为统计模型的初始训练集。进一步,在基因预测和参数重新估计的轮次中迭代更新模型参数。达到收敛后,GeneMark-ETP 做出最终预测并提供整套预测基因。GeneMark-ETP 的表现优于仅使用单一类型外显子证据的基因预测器。与同时使用转录组和蛋白质衍生外显子证据的基因预测器 MAKER2 和 TSEBRA 的比较表明,GeneMark-ETP 提供了最先进的基因预测准确性,并且随着其在更大和更复杂的真核生物基因组中的应用,其性能优于现有方法的幅度不断增加。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ed2f/11216313/70d5aef32078/757f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验