Suppr超能文献

在果蝇基因组DNA中进行从头基因预测。

Ab initio gene finding in Drosophila genomic DNA.

作者信息

Salamov A A, Solovyev V V

机构信息

The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK.

出版信息

Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516.

Abstract

Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify approximately 90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf. html.

摘要

利用(人类基因预测器)和Fgenesh程序对黑腹果蝇基因组序列进行从头基因识别,这些程序具有针对人类、果蝇、植物、酵母和线虫的特定生物体参数。在大多数预测中,我们没有使用cDNA/EST信息来模拟发现新基因的实际情况,因为完整cDNA信息通常缺失或基于非常小的部分片段。我们在不同水平上研究了基因预测的准确性,并设计了几种方案来预测明确的基因集(注释CGG1)、可靠的外显子集(注释CGG2)和最完整的外显子集(注释CGG3)。对于49个基因,其蛋白质产物在蛋白质数据库中有明确的同源物,通过Fgenesh+程序重新计算预测结果。第一个注释作为要在数据库中呈现的新序列的最佳计算描述。第二个注释中的可靠外显子是选择用于基因结构验证实验工作的PCR引物的良好候选者。我们的结果表明,我们可以识别大约90%的编码核苷酸,假阳性率为20%。在外显子水平上,我们准确预测了65%的外显子,包括重叠外显子在内为89%,假阳性率为49%。为了优化预测准确性,我们设计了一种使用Fgenesh的基因识别方案,该方案在碱基水平上提供的灵敏度(Sn)=98%,特异性(Sp)=86%,在外显子水平上Sn=81%(包括重叠外显子在内为97%),Sp=58%,在基因水平上Sn=72%,Sp=39%(在std1集上估计灵敏度,在std3集上估计特异性)。总体而言,这些结果表明,计算基因预测可以成为注释新基因组序列的可靠工具,能给出90%编码序列的准确信息,假阳性率为14%。然而,精确的基因预测(尤其是在基因水平上)需要使用基因预测算法进行进一步改进。该程序还经过测试用于预测人类22号染色体的基因(Fgenesh的最新版本可以分析整个染色体序列)。该分析表明,22号染色体中88%的人工注释外显子在从头预测的外显子之中。这套基因识别程序可通过计算基因组学小组的万维网服务器获取,网址为http://genomic.sanger.ac.uk/gf.html。

相似文献

3
Genie--gene finding in Drosophila melanogaster.精灵——黑腹果蝇中的基因发现
Genome Res. 2000 Apr;10(4):529-38. doi: 10.1101/gr.10.4.529.
5
GeneID in Drosophila.果蝇中的基因标识符。
Genome Res. 2000 Apr;10(4):511-5. doi: 10.1101/gr.10.4.511.
6
10
Automatic annotation of eukaryotic genes, pseudogenes and promoters.真核基因、假基因和启动子的自动注释
Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

引用本文的文献

6
Oryza genome evolution through a tetraploid lens.从四倍体视角看水稻基因组的进化
Nat Genet. 2025 May;57(5):1287-1297. doi: 10.1038/s41588-025-02183-5. Epub 2025 Apr 28.
7
Genomic signatures associated with the evolutionary loss of egg yolk in parasitoid wasps.与寄生蜂卵黄进化丧失相关的基因组特征。
Proc Natl Acad Sci U S A. 2025 Apr 22;122(16):e2422292122. doi: 10.1073/pnas.2422292122. Epub 2025 Apr 15.
9
ZW sex chromosome structure in Amborella trichopoda.无油樟的ZW性染色体结构。
Nat Plants. 2024 Dec;10(12):1944-1954. doi: 10.1038/s41477-024-01858-x. Epub 2024 Nov 25.

本文引用的文献

1
4
Finding the genes in genomic DNA.在基因组DNA中寻找基因。
Curr Opin Struct Biol. 1998 Jun;8(3):346-54. doi: 10.1016/s0959-440x(98)80069-9.
5
Repeats in genomic DNA: mining and meaning.基因组DNA中的重复序列:挖掘与意义
Curr Opin Struct Biol. 1998 Jun;8(3):333-7. doi: 10.1016/s0959-440x(98)80067-5.
8
GenBank.基因银行
Nucleic Acids Res. 1998 Jan 1;26(1):1-7. doi: 10.1093/nar/26.1.1.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验