Suppr超能文献

利用来自生化模式昆虫的RNA-Seq数据对烟草天蛾基因组中的蛋白质编码基因进行整合建模。

Integrated modeling of protein-coding genes in the Manduca sexta genome using RNA-Seq data from the biochemical model insect.

作者信息

Cao Xiaolong, Jiang Haobo

机构信息

Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK 74078, USA.

Department of Entomology and Plant Pathology, Oklahoma State University, Stillwater, OK 74078, USA.

出版信息

Insect Biochem Mol Biol. 2015 Jul;62:2-10. doi: 10.1016/j.ibmb.2015.01.007. Epub 2015 Jan 20.

Abstract

The genome sequence of Manduca sexta was recently determined using 454 technology. Cufflinks and MAKER2 were used to establish gene models in the genome assembly based on the RNA-Seq data and other species' sequences. Aided by the extensive RNA-Seq data from 50 tissue samples at various life stages, annotators over the world (including the present authors) have manually confirmed and improved a small percentage of the models after spending months of effort. While such collaborative efforts are highly commendable, many of the predicted genes still have problems which may hamper future research on this insect species. As a biochemical model representing lepidopteran pests, M. sexta has been used extensively to study insect physiological processes for over five decades. In this work, we assembled Manduca datasets Cufflinks 3.0, Trinity 4.0, and Oases 4.0 to assist the manual annotation efforts and development of Official Gene Set (OGS) 2.0. To further improve annotation quality, we developed methods to evaluate gene models in the MAKER2, Cufflinks, Oases and Trinity assemblies and selected the best ones to constitute MCOT 1.0 after thorough crosschecking. MCOT 1.0 has 18,089 genes encoding 31,666 proteins: 32.8% match OGS 2.0 models perfectly or near perfectly, 11,747 differ considerably, and 29.5% are absent in OGS 2.0. Future automation of this process is anticipated to greatly reduce human efforts in generating comprehensive, reliable models of structural genes in other genome projects where extensive RNA-Seq data are available.

摘要

烟草天蛾的基因组序列最近利用454技术测定完成。基于RNA测序数据和其他物种的序列,使用Cufflinks和MAKER2在基因组组装中建立基因模型。在来自不同生命阶段的50个组织样本的大量RNA测序数据的辅助下,世界各地的注释者(包括本文作者)经过数月努力,人工确认并改进了一小部分模型。虽然这种合作努力非常值得称赞,但许多预测基因仍然存在问题,这可能会妨碍对这种昆虫物种的未来研究。作为鳞翅目害虫的生化模型,烟草天蛾已被广泛用于研究昆虫生理过程五十多年。在这项工作中,我们组装了烟草天蛾数据集Cufflinks 3.0、Trinity 4.0和Oases 4.0,以协助人工注释工作和官方基因集(OGS)2.0的开发。为了进一步提高注释质量,我们开发了评估MAKER2、Cufflinks、Oases和Trinity组装中基因模型的方法,并在彻底交叉检查后选择最佳模型组成MCOT 1.0。MCOT 1.0有18,089个基因编码31,666种蛋白质:32.8%与OGS 2.0模型完美或近乎完美匹配,11,747个差异很大,29.5%在OGS 2.0中不存在。预计该过程未来的自动化将大大减少在其他有大量RNA测序数据的基因组项目中生成全面、可靠的结构基因模型时的人力投入。

相似文献

3
4
The immune signaling pathways of Manduca sexta.烟草天蛾的免疫信号通路。
Insect Biochem Mol Biol. 2015 Jul;62:64-74. doi: 10.1016/j.ibmb.2015.03.006. Epub 2015 Apr 7.

引用本文的文献

4
Digestion-related proteins in the tobacco hornworm, Manduca sexta.烟草天蛾(Manduca sexta)中与消化相关的蛋白质。
Insect Biochem Mol Biol. 2020 Nov;126:103457. doi: 10.1016/j.ibmb.2020.103457. Epub 2020 Aug 27.

本文引用的文献

3
A comprehensive analysis of the Manduca sexta immunotranscriptome.曼陀罗蚕免疫转录组的综合分析。
Dev Comp Immunol. 2013 Apr;39(4):388-98. doi: 10.1016/j.dci.2012.10.004. Epub 2012 Nov 23.
4
A beginner's guide to eukaryotic genome annotation.真核生物基因组注释入门指南。
Nat Rev Genet. 2012 Apr 18;13(5):329-42. doi: 10.1038/nrg3174.
5
Fast gapped-read alignment with Bowtie 2.快速缺口读对准与 Bowtie 2。
Nat Methods. 2012 Mar 4;9(4):357-9. doi: 10.1038/nmeth.1923.
10
Immunity in lepidopteran insects.鳞翅目昆虫的免疫。
Adv Exp Med Biol. 2010;708:181-204. doi: 10.1007/978-1-4419-8059-5_10.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验