• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于同源性的基因结构预测:使用翻译密码子(tron)的简化匹配算法,并通过允许长间隙提高准确性。

Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps.

作者信息

Gotoh O

机构信息

Saitama Cancer Center Research Institute, 818 Komuro Ina-machi, Saitama 362-0806, Japan.

出版信息

Bioinformatics. 2000 Mar;16(3):190-202. doi: 10.1093/bioinformatics/16.3.190.

DOI:10.1093/bioinformatics/16.3.190
PMID:10869012
Abstract

MOTIVATION

Locating protein-coding exons (CDSs) on a eukaryotic genomic DNA sequence is the initial and an essential step in predicting the functions of the genes embedded in that part of the genome. Accurate prediction of CDSs may be achieved by directly matching the DNA sequence with a known protein sequence or profile of a homologous family member(s).

RESULTS

A new convention for encoding a DNA sequence into a series of 23 possible letters (translated codon or tron code) was devised to improve this type of analysis. Using this convention, a dynamic programming algorithm was developed to align a DNA sequence and a protein sequence or profile so that the spliced and translated sequence optimally matches the reference the same as the standard protein sequence alignment allowing for long gaps. The objective function also takes account of frameshift errors, coding potentials, and translational initiation, termination and splicing signals. This method was tested on Caenorhabditis elegans genes of known structures. The accuracy of prediction measured in terms of a correlation coefficient (CC) was about 95% at the nucleotide level for the 288 genes tested, and 97. 0% for the 170 genes whose product and closest homologue share more than 30% identical amino acids. We also propose a strategy to improve the accuracy of prediction for a set of paralogous genes by means of iterative gene prediction and reconstruction of the reference profile derived from the predicted sequences.

AVAILABILITY

The source codes for the program 'aln' written in ANSI-C and the test data will be available via anonymous FTP at ftp.genome.ad.jp/pub/genomenet/saitama-cc.

CONTACT

gotoh@cancer-c.pref.saitama.jp

摘要

动机

在真核生物基因组DNA序列上定位蛋白质编码外显子(CDS)是预测基因组该部分所嵌入基因功能的初始且关键步骤。通过将DNA序列与已知蛋白质序列或同源家族成员的序列谱直接匹配,可实现对CDS的准确预测。

结果

设计了一种将DNA序列编码为一系列23种可能字母(翻译密码子或tron码)的新方法,以改进此类分析。使用该方法,开发了一种动态规划算法来比对DNA序列与蛋白质序列或序列谱,从而使拼接和翻译后的序列与参考序列(与标准蛋白质序列比对允许存在长间隙的情况相同)实现最优匹配。目标函数还考虑了移码错误、编码潜能以及翻译起始、终止和剪接信号。该方法在已知结构的秀丽隐杆线虫基因上进行了测试。在所测试的288个基因中,以相关系数(CC)衡量的预测准确性在核苷酸水平约为95%,对于其产物与最接近的同源物共享超过30%相同氨基酸的170个基因,预测准确性为97.0%。我们还提出了一种策略,通过迭代基因预测和从预测序列重建参考序列谱来提高一组旁系同源基因的预测准确性。

可用性

用ANSI-C编写的程序“aln”的源代码和测试数据将通过匿名FTP在ftp.genome.ad.jp/pub/genomenet/saitama-cc上获取。

联系方式

gotoh@cancer-c.pref.saitama.jp

相似文献

1
Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps.基于同源性的基因结构预测:使用翻译密码子(tron)的简化匹配算法,并通过允许长间隙提高准确性。
Bioinformatics. 2000 Mar;16(3):190-202. doi: 10.1093/bioinformatics/16.3.190.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.通过基因组DNA与蛋白质序列的剪接比对进行基因结构预测:通过差异剪接位点评分提高准确性。
J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.
4
Direct mapping and alignment of protein sequences onto genomic sequence.蛋白质序列到基因组序列的直接映射与比对。
Bioinformatics. 2008 Nov 1;24(21):2438-44. doi: 10.1093/bioinformatics/btn460. Epub 2008 Aug 26.
5
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
6
The tpa-1 gene of Caenorhabditis elegans encodes two proteins similar to Ca(2+)-independent protein kinase Cs: evidence by complete genomic and complementary DNA sequences of the tpa-1 gene.秀丽隐杆线虫的tpa-1基因编码两种类似于钙离子非依赖性蛋白激酶C的蛋白质:tpa-1基因完整基因组和互补DNA序列的证据。
J Mol Biol. 1995 Aug 25;251(4):477-85. doi: 10.1006/jmbi.1995.0449.
7
Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans.利用mRNA长度准确预测秀丽隐杆线虫中的可变剪接基因产物。
Bioinformatics. 2006 May 15;22(10):1239-44. doi: 10.1093/bioinformatics/btl076. Epub 2006 Apr 4.
8
Optimal spliced alignment of homologous cDNA to a genomic DNA template.同源cDNA与基因组DNA模板的最佳剪接比对。
Bioinformatics. 2000 Mar;16(3):203-11. doi: 10.1093/bioinformatics/16.3.203.
9
GeneBuilder: interactive in silico prediction of gene structure.基因构建器:基因结构的交互式计算机模拟预测
Bioinformatics. 1999 Jul-Aug;15(7-8):612-21. doi: 10.1093/bioinformatics/15.7.612.
10
Genomic characterization of Tv-ant-1, a Caenorhabditis elegans tag-61 homologue from the parasitic nematode Trichostrongylus vitrinus.来自寄生线虫玻璃细颈线虫的秀丽隐杆线虫tag-61同源物Tv-ant-1的基因组特征分析。
Gene. 2007 Aug 1;397(1-2):12-25. doi: 10.1016/j.gene.2007.03.011. Epub 2007 Mar 30.

引用本文的文献

1
Spaln3: improvement in speed and accuracy of genome mapping and spliced alignment of protein query sequences.Spaln3:提高基因组作图和蛋白质查询序列拼接比对的速度和准确性。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae517.
2
Discovery of a gene cluster for the biosynthesis of novel cyclic peptide compound, KK-1, in .在……中发现了用于新型环肽化合物KK-1生物合成的基因簇。
Front Fungal Biol. 2023 Jan 20;3:1081179. doi: 10.3389/ffunb.2022.1081179. eCollection 2022.
3
Use of Average Mutual Information and Derived Measures to Find Coding Regions.
使用平均互信息及派生度量来寻找编码区域。
Entropy (Basel). 2021 Oct 11;23(10):1324. doi: 10.3390/e23101324.
4
Cooperation of Spaln and Prrn5 for Construction of Gene-Structure-Aware Multiple Sequence Alignment.Spaln和Prrn5在构建基因结构感知多序列比对中的合作。
Methods Mol Biol. 2021;2231:71-88. doi: 10.1007/978-1-0716-1036-7_5.
5
SEVENS: a database for comprehensive GPCR genes obtained from genomes: -Update to 68 eukaryotes.SEVENS:一个从基因组中获取的全面GPCR基因数据库:-更新至68种真核生物。
Biophys Physicobiol. 2018 Apr 27;15:104-110. doi: 10.2142/biophysico.15.0_104. eCollection 2018.
6
Genome Sequence of Ustilaginoidea virens IPU010, a Rice Pathogenic Fungus Causing False Smut.引起稻曲病的水稻致病真菌稻绿核菌IPU010的基因组序列
Genome Announc. 2016 May 5;4(3):e00306-16. doi: 10.1128/genomeA.00306-16.
7
Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment.利用基因结构感知多蛋白序列比对评估和优化真核基因结构预测
BMC Bioinformatics. 2014 Jun 14;15:189. doi: 10.1186/1471-2105-15-189.
8
Benchmarking spliced alignment programs including Spaln2, an extended version of Spaln that incorporates additional species-specific features.对拼接比对程序进行基准测试,包括 Spaln2,这是 Spaln 的扩展版本,其中包含了额外的特定于物种的特征。
Nucleic Acids Res. 2012 Nov 1;40(20):e161. doi: 10.1093/nar/gks708. Epub 2012 Jul 30.
9
The 2008 update of the Aspergillus nidulans genome annotation: a community effort.构巢曲霉基因组注释的2008年更新:一项群体协作成果。
Fungal Genet Biol. 2009 Mar;46 Suppl 1(Suppl 1):S2-13. doi: 10.1016/j.fgb.2008.12.003. Epub 2008 Dec 25.
10
Improved annotation through genome-scale metabolic modeling of Aspergillus oryzae.通过米曲霉的全基因组规模代谢模型改进注释
BMC Genomics. 2008 May 23;9:245. doi: 10.1186/1471-2164-9-245.