Suppr超能文献

改进异小杆线虫基因组的注释。

Improving the annotation of the Heterorhabditis bacteriophora genome.

机构信息

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK.

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA.

出版信息

Gigascience. 2018 Apr 1;7(4). doi: 10.1093/gigascience/giy034.

Abstract

BACKGROUND

Genome assembly and annotation remain exacting tasks. As the tools available for these tasks improve, it is useful to return to data produced with earlier techniques to assess their credibility and correctness. The entomopathogenic nematode Heterorhabditis bacteriophora is widely used to control insect pests in horticulture. The genome sequence for this species was reported to encode an unusually high proportion of unique proteins and a paucity of secreted proteins compared to other related nematodes.

FINDINGS

We revisited the H. bacteriophora genome assembly and gene predictions to determine whether these unusual characteristics were biological or methodological in origin. We mapped an independent resequencing dataset to the genome and used the blobtools pipeline to identify potential contaminants. While present (0.2% of the genome span, 0.4% of predicted proteins), assembly contamination was not significant.

CONCLUSIONS

Re-prediction of the gene set using BRAKER1 and published transcriptome data generated a predicted proteome that was very different from the published one. The new gene set had a much reduced complement of unique proteins, better completeness values that were in line with other related species' genomes, and an increased number of proteins predicted to be secreted. It is thus likely that methodological issues drove the apparent uniqueness of the initial H. bacteriophora genome annotation and that similar contamination and misannotation issues affect other published genome assemblies.

摘要

背景

基因组组装和注释仍然是一项艰巨的任务。随着这些任务的工具不断改进,有必要利用早期技术生成的数据来评估其可信度和正确性。昆虫病原线虫异小杆线虫被广泛用于园艺中防治害虫。该物种的基因组序列被报道编码了异常高比例的独特蛋白质,与其他相关线虫相比,分泌蛋白的数量很少。

发现

我们重新审视了 H. bacteriophora 基因组组装和基因预测,以确定这些不寻常的特征是源于生物学还是方法学。我们将一个独立的重测序数据集映射到基因组上,并使用 blobtools 管道来识别潜在的污染物。虽然存在(基因组跨度的 0.2%,预测蛋白质的 0.4%),但组装污染并不显著。

结论

使用 BRAKER1 和已发表的转录组数据重新预测基因集生成了一个与已发表的基因集非常不同的预测蛋白质组。新的基因集独特蛋白质的数量大大减少,完整性值更好,与其他相关物种的基因组一致,并且预测分泌的蛋白质数量增加。因此,最初 H. bacteriophora 基因组注释的明显独特性很可能是由于方法学问题引起的,并且类似的污染和错误注释问题可能会影响其他已发表的基因组组装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/acb4/5906903/fe5abcdd3339/giy034fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验