• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在果蝇基因组DNA中进行从头基因预测。

Ab initio gene finding in Drosophila genomic DNA.

作者信息

Salamov A A, Solovyev V V

机构信息

The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK.

出版信息

Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516.

DOI:10.1101/gr.10.4.516
PMID:10779491
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC310882/
Abstract

Ab initio gene identification in the genomic sequence of Drosophila melanogaster was obtained using (human gene predictor) and Fgenesh programs that have organism-specific parameters for human, Drosophila, plants, yeast, and nematode. We did not use information about cDNA/EST in most predictions to model a real situation for finding new genes because information about complete cDNA is often absent or based on very small partial fragments. We investigated the accuracy of gene prediction on different levels and designed several schemes to predict an unambiguous set of genes (annotation CGG1), a set of reliable exons (annotation CGG2), and the most complete set of exons (annotation CGG3). For 49 genes, protein products of which have clear homologs in protein databases, predictions were recomputed by Fgenesh+ program. The first annotation serves as the optimal computational description of new sequence to be presented in a database. Reliable exons from the second annotation serve as good candidates for selecting the PCR primers for experimental work for gene structure verification. Our results shows that we can identify approximately 90% of coding nucleotides with 20% false positives. At the exon level we accurately predicted 65% of exons and 89% including overlapping exons with 49% false positives. Optimizing accuracy of prediction, we designed a gene identification scheme using Fgenesh, which provided sensitivity (Sn) = 98% and specificity (Sp) = 86% at the base level, Sn = 81% (97% including overlapping exons) and Sp = 58% at the exon level and Sn = 72% and Sp = 39% at the gene level (estimating sensitivity on std1 set and specificity on std3 set). In general, these results showed that computational gene prediction can be a reliable tool for annotating new genomic sequences, giving accurate information on 90% of coding sequences with 14% false positives. However, exact gene prediction (especially at the gene level) needs additional improvement using gene prediction algorithms. The program was also tested for predicting genes of human Chromosome 22 (the last variant of Fgenesh can analyze the whole chromosome sequence). This analysis has demonstrated that the 88% of manually annotated exons in Chromosome 22 were among the ab initio predicted exons. The suite of gene identification programs is available through the WWW server of Computational Genomics Group at http://genomic.sanger.ac.uk/gf. html.

摘要

利用(人类基因预测器)和Fgenesh程序对黑腹果蝇基因组序列进行从头基因识别,这些程序具有针对人类、果蝇、植物、酵母和线虫的特定生物体参数。在大多数预测中,我们没有使用cDNA/EST信息来模拟发现新基因的实际情况,因为完整cDNA信息通常缺失或基于非常小的部分片段。我们在不同水平上研究了基因预测的准确性,并设计了几种方案来预测明确的基因集(注释CGG1)、可靠的外显子集(注释CGG2)和最完整的外显子集(注释CGG3)。对于49个基因,其蛋白质产物在蛋白质数据库中有明确的同源物,通过Fgenesh+程序重新计算预测结果。第一个注释作为要在数据库中呈现的新序列的最佳计算描述。第二个注释中的可靠外显子是选择用于基因结构验证实验工作的PCR引物的良好候选者。我们的结果表明,我们可以识别大约90%的编码核苷酸,假阳性率为20%。在外显子水平上,我们准确预测了65%的外显子,包括重叠外显子在内为89%,假阳性率为49%。为了优化预测准确性,我们设计了一种使用Fgenesh的基因识别方案,该方案在碱基水平上提供的灵敏度(Sn)=98%,特异性(Sp)=86%,在外显子水平上Sn=81%(包括重叠外显子在内为97%),Sp=58%,在基因水平上Sn=72%,Sp=39%(在std1集上估计灵敏度,在std3集上估计特异性)。总体而言,这些结果表明,计算基因预测可以成为注释新基因组序列的可靠工具,能给出90%编码序列的准确信息,假阳性率为14%。然而,精确的基因预测(尤其是在基因水平上)需要使用基因预测算法进行进一步改进。该程序还经过测试用于预测人类22号染色体的基因(Fgenesh的最新版本可以分析整个染色体序列)。该分析表明,22号染色体中88%的人工注释外显子在从头预测的外显子之中。这套基因识别程序可通过计算基因组学小组的万维网服务器获取,网址为http://genomic.sanger.ac.uk/gf.html。

相似文献

1
Ab initio gene finding in Drosophila genomic DNA.在果蝇基因组DNA中进行从头基因预测。
Genome Res. 2000 Apr;10(4):516-22. doi: 10.1101/gr.10.4.516.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Genie--gene finding in Drosophila melanogaster.精灵——黑腹果蝇中的基因发现
Genome Res. 2000 Apr;10(4):529-38. doi: 10.1101/gr.10.4.529.
4
EGPred: prediction of eukaryotic genes using ab initio methods after combining with sequence similarity approaches.EGPred:结合序列相似性方法后使用从头算方法预测真核基因。
Genome Res. 2004 Sep;14(9):1756-66. doi: 10.1101/gr.2524704.
5
GeneID in Drosophila.果蝇中的基因标识符。
Genome Res. 2000 Apr;10(4):511-5. doi: 10.1101/gr.10.4.511.
6
Genome annotation assessment in Drosophila melanogaster.黑腹果蝇的基因组注释评估
Genome Res. 2000 Apr;10(4):483-501. doi: 10.1101/gr.10.4.483.
7
MAGPIE/EGRET annotation of the 2.9-Mb Drosophila melanogaster Adh region.黑腹果蝇2.9兆碱基乙醇脱氢酶(Adh)区域的MAGPIE/EGRET注释
Genome Res. 2000 Apr;10(4):502-10. doi: 10.1101/gr.10.4.502.
8
Using ESTs to improve the accuracy of de novo gene prediction.利用表达序列标签提高从头基因预测的准确性。
BMC Bioinformatics. 2006 Jul 3;7:327. doi: 10.1186/1471-2105-7-327.
9
AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome.EGASP中的AUGUSTUS:利用EST、蛋白质和基因组比对改进人类基因组中的基因预测
Genome Biol. 2006;7 Suppl 1(Suppl 1):S11.1-8. doi: 10.1186/gb-2006-7-s1-s11. Epub 2006 Aug 7.
10
Automatic annotation of eukaryotic genes, pseudogenes and promoters.真核基因、假基因和启动子的自动注释
Genome Biol. 2006;7 Suppl 1(Suppl 1):S10.1-12. doi: 10.1186/gb-2006-7-s1-s10. Epub 2006 Aug 7.

引用本文的文献

1
Telomere-to-telomere African wild rice (Oryza longistaminata) reference genome reveals segmental and structural variation.端粒到端粒的非洲野生稻(长雄蕊野生稻)参考基因组揭示了片段和结构变异。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf074.
2
The chromosome-level genome of Hemerocallis middendorffii provides new insights into the floral scents and color biosynthesis in Chinese native daylily.大花萱草的染色体水平基因组为中国原生萱草的花香和色素生物合成提供了新见解。
BMC Plant Biol. 2025 Jul 4;25(1):874. doi: 10.1186/s12870-025-06863-6.
3
Genome-wide annotation and comparative analysis of miniature inverted-repeat transposable elements (MITEs) in six pear species.六种梨属植物中微小反向重复转座元件(MITEs)的全基因组注释与比较分析
Planta. 2025 Jun 16;262(2):29. doi: 10.1007/s00425-025-04750-w.
4
A haplotype-resolved reference genome for Eucalyptus grandis.一个单倍型解析的巨桉参考基因组。
G3 (Bethesda). 2025 Jul 9;15(7). doi: 10.1093/g3journal/jkaf112.
5
Scaffolded and annotated nuclear and organelle genomes of the North American brown alga .北美褐藻的支架式和注释核基因组及细胞器基因组
Front Genet. 2025 May 14;16:1494480. doi: 10.3389/fgene.2025.1494480. eCollection 2025.
6
Oryza genome evolution through a tetraploid lens.从四倍体视角看水稻基因组的进化
Nat Genet. 2025 May;57(5):1287-1297. doi: 10.1038/s41588-025-02183-5. Epub 2025 Apr 28.
7
Genomic signatures associated with the evolutionary loss of egg yolk in parasitoid wasps.与寄生蜂卵黄进化丧失相关的基因组特征。
Proc Natl Acad Sci U S A. 2025 Apr 22;122(16):e2422292122. doi: 10.1073/pnas.2422292122. Epub 2025 Apr 15.
8
Metagenome-assembled-genomes recovered from the Arctic drift expedition MOSAiC.从北极漂移考察“马赛克”(MOSAiC)中获得的宏基因组组装基因组。
Sci Data. 2025 Feb 4;12(1):204. doi: 10.1038/s41597-025-04525-8.
9
ZW sex chromosome structure in Amborella trichopoda.无油樟的ZW性染色体结构。
Nat Plants. 2024 Dec;10(12):1944-1954. doi: 10.1038/s41477-024-01858-x. Epub 2024 Nov 25.
10
A Proteogenomic Approach for the Identification of Virulence Factors in Leishmania Parasites.一种蛋白质基因组学方法用于鉴定利什曼原虫寄生虫中的毒力因子。
Methods Mol Biol. 2025;2859:279-296. doi: 10.1007/978-1-0716-4152-1_16.

本文引用的文献

1
Genome annotation assessment in Drosophila melanogaster.黑腹果蝇的基因组注释评估
Genome Res. 2000 Apr;10(4):483-501. doi: 10.1101/gr.10.4.483.
2
An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.黑腹果蝇基因组2.9兆碱基区域序列的探索:乙醇脱氢酶区域
Genetics. 1999 Sep;153(1):179-219. doi: 10.1093/genetics/153.1.179.
3
INFOGENE: a database of known gene structures and predicted genes and proteins in sequences of genome sequencing projects.INFOGENE:一个关于基因组测序项目序列中已知基因结构以及预测基因和蛋白质的数据库。
Nucleic Acids Res. 1999 Jan 1;27(1):248-50. doi: 10.1093/nar/27.1.248.
4
Finding the genes in genomic DNA.在基因组DNA中寻找基因。
Curr Opin Struct Biol. 1998 Jun;8(3):346-54. doi: 10.1016/s0959-440x(98)80069-9.
5
Repeats in genomic DNA: mining and meaning.基因组DNA中的重复序列:挖掘与意义
Curr Opin Struct Biol. 1998 Jun;8(3):333-7. doi: 10.1016/s0959-440x(98)80067-5.
6
'Rough draft' of human genome wins researchers' backing.人类基因组“初稿”获研究人员支持。
Nature. 1998 Jun 4;393(6684):399-400. doi: 10.1038/30790.
7
A conserved p38 mitogen-activated protein kinase pathway regulates Drosophila immunity gene expression.一条保守的p38丝裂原活化蛋白激酶途径调控果蝇免疫基因表达。
Mol Cell Biol. 1998 Jun;18(6):3527-39. doi: 10.1128/MCB.18.6.3527.
8
GenBank.基因银行
Nucleic Acids Res. 1998 Jan 1;26(1):1-7. doi: 10.1093/nar/26.1.1.
9
The Gene-Finder computer tools for analysis of human and model organisms genome sequences.用于分析人类和模式生物基因组序列的基因查找计算机工具。
Proc Int Conf Intell Syst Mol Biol. 1997;5:294-302.
10
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.空位BLAST和位置特异性迭代BLAST:新一代蛋白质数据库搜索程序。
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389.