• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Gene recognition via spliced sequence alignment.通过剪接序列比对进行基因识别。
Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):9061-6. doi: 10.1073/pnas.93.17.9061.
2
Performance-guarantee gene predictions via spliced alignment.通过剪接比对实现性能保证的基因预测。
Genomics. 1998 Aug 1;51(3):332-9. doi: 10.1006/geno.1998.5251.
3
SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups. splicedFamAlign:CDS 到基因拼接对齐和转录本同源物组的鉴定。
BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.
4
Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.通过基因组DNA与蛋白质序列的剪接比对进行基因结构预测:通过差异剪接位点评分提高准确性。
J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.
5
Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment.用于基因识别的拉斯维加斯算法:次优且容错的剪接比对
J Comput Biol. 1997 Fall;4(3):297-309. doi: 10.1089/cmb.1997.4.297.
6
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
7
Accurate identification of alternatively spliced exons using support vector machine.使用支持向量机准确识别可变剪接外显子。
Bioinformatics. 2005 Apr 1;21(7):897-901. doi: 10.1093/bioinformatics/bti132. Epub 2004 Nov 5.
8
The human Nramp2 gene: characterization of the gene structure, alternative splicing, promoter region and polymorphisms.人类Nramp2基因:基因结构、可变剪接、启动子区域及多态性的特征分析
Blood Cells Mol Dis. 1998 Jun;24(2):199-215. doi: 10.1006/bcmd.1998.0186.
9
Gene recognition in eukaryotic DNA by comparison of genomic sequences.通过基因组序列比较进行真核生物DNA中的基因识别。
Bioinformatics. 2001 Nov;17(11):1011-8. doi: 10.1093/bioinformatics/17.11.1011.
10
Gene structure prediction using information on homologous protein sequence.
Comput Appl Biosci. 1996 Jun;12(3):161-70. doi: 10.1093/bioinformatics/12.3.161.

引用本文的文献

1
ORFograph: search for novel insecticidal protein genes in genomic and metagenomic assembly graphs.ORFograph:在基因组和宏基因组组装图中搜索新型杀虫蛋白基因。
Microbiome. 2021 Jun 28;9(1):149. doi: 10.1186/s40168-021-01092-z.
2
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database.BRAKER2:借助蛋白质数据库,由GeneMark-EP+和AUGUSTUS支持的真核生物基因组自动注释工具。
NAR Genom Bioinform. 2021 Jan 6;3(1):lqaa108. doi: 10.1093/nargab/lqaa108. eCollection 2021 Mar.
3
Cooperation of Spaln and Prrn5 for Construction of Gene-Structure-Aware Multiple Sequence Alignment.Spaln和Prrn5在构建基因结构感知多序列比对中的合作。
Methods Mol Biol. 2021;2231:71-88. doi: 10.1007/978-1-0716-1036-7_5.
4
MetaEuk-sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics.元真核生物敏感、高通量的基因发现和注释,用于大规模真核生物宏基因组学。
Microbiome. 2020 Apr 3;8(1):48. doi: 10.1186/s40168-020-00808-x.
5
Whole-Genome Alignment and Comparative Annotation.全基因组比对和注释。
Annu Rev Anim Biosci. 2019 Feb 15;7:41-64. doi: 10.1146/annurev-animal-020518-115005. Epub 2018 Oct 31.
6
An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm.一种使用遗传算法对大型真核生物基因组序列进行注释的优化方法。
BMC Bioinformatics. 2017 Oct 24;18(1):460. doi: 10.1186/s12859-017-1874-7.
7
A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra.一种基于质量图谱的方法,用于使用自上而下的串联质谱鉴定修饰的蛋白质异构体。
Bioinformatics. 2017 May 1;33(9):1309-1316. doi: 10.1093/bioinformatics/btw806.
8
Physico-chemical fingerprinting of RNA genes.RNA基因的物理化学指纹图谱
Nucleic Acids Res. 2017 Apr 20;45(7):e47. doi: 10.1093/nar/gkw1236.
9
Novel Gene Discovery in the Human Malaria Parasite using Nucleosome Positioning Data.利用核小体定位数据在人类疟原虫中发现新基因
Comput Syst Bioinformatics Conf. 2010 Aug;9:124-135.
10
Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.评估高通量从头基因预测软件,以发现实验室技术遗漏的真核病原体基因组编码的蛋白质。
PLoS One. 2012;7(11):e50609. doi: 10.1371/journal.pone.0050609. Epub 2012 Nov 30.

本文引用的文献

1
Recognition of genes in human DNA sequences.
J Comput Biol. 1996 Summer;3(2):223-34. doi: 10.1089/cmb.1996.3.223.
2
Identification of protein coding regions by database similarity search.通过数据库相似性搜索鉴定蛋白质编码区域。
Nat Genet. 1993 Mar;3(3):266-72. doi: 10.1038/ng0393-266.
3
Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks.基因组DNA序列中编码区域的识别:动态规划和神经网络的应用
Nucleic Acids Res. 1993 Feb 11;21(3):607-13. doi: 10.1093/nar/21.3.607.
4
The human gastrin/cholecystokinin type B receptor gene: alternative splice donor site in exon 4 generates two variant mRNAs.人类胃泌素/缩胆囊素B型受体基因:外显子4中的可变剪接供体位点产生两种可变mRNA。
Proc Natl Acad Sci U S A. 1993 Oct 1;90(19):9085-9. doi: 10.1073/pnas.90.19.9085.
5
Prediction of the exon-intron structure by a dynamic programming approach.
Biosystems. 1993;30(1-3):173-82. doi: 10.1016/0303-2647(93)90069-o.
6
3,400 new expressed sequence tags identify diversity of transcripts in human brain.3400个新的表达序列标签揭示了人类大脑中转录本的多样性。
Nat Genet. 1993 Jul;4(3):256-67. doi: 10.1038/ng0793-256.
7
Gene structure prediction by linguistic methods.
Genomics. 1994 Oct;23(3):540-51. doi: 10.1006/geno.1994.1541.
8
Predicting internal exons by oligonucleotide composition and discriminant analysis of spliceable open reading frames.通过寡核苷酸组成和可剪接开放阅读框的判别分析预测内部外显子。
Nucleic Acids Res. 1994 Dec 11;22(24):5156-63. doi: 10.1093/nar/22.24.5156.
9
Identification of protein coding regions in genomic DNA.基因组DNA中蛋白质编码区域的鉴定。
J Mol Biol. 1995 Apr 21;248(1):1-18. doi: 10.1006/jmbi.1995.0198.
10
Prediction of function in DNA sequence analysis.DNA序列分析中的功能预测
J Comput Biol. 1995 Spring;2(1):87-115. doi: 10.1089/cmb.1995.2.87.

通过剪接序列比对进行基因识别。

Gene recognition via spliced sequence alignment.

作者信息

Gelfand M S, Mironov A A, Pevzner P A

机构信息

Institute of Protein Research, Russian Academy of Sciences, Puschino, Moscow, Russia.

出版信息

Proc Natl Acad Sci U S A. 1996 Aug 20;93(17):9061-6. doi: 10.1073/pnas.93.17.9061.

DOI:10.1073/pnas.93.17.9061
PMID:8799154
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC38595/
Abstract

Gene recognition is one of the most important problems in computational molecular biology. Previous attempts to solve this problem were based on statistics, and applications of combinatorial methods for gene recognition were almost unexplored. Recent advances in large-scale cDNA sequencing open a way toward a new approach to gene recognition that uses previously sequenced genes as a clue for recognition of newly sequenced genes. This paper describes a spliced alignment algorithm and software tool that explores all possible exon assemblies in polynomial time and finds the multiexon structure with the best fit to a related protein. Unlike other existing methods, the algorithm successfully recognizes genes even in the case of short exons or exons with unusual codon usage; we also report correct assemblies for genes with more than 10 exons. On a test sample of human genes with known mammalian relatives, the average correlation between the predicted and actual proteins was 99%. The algorithm correctly reconstructed 87% of genes and the rare discrepancies between the predicted and real exon-intron structures were caused either by short (less than 5 amino acids) initial/terminal exons or by alternative splicing. Moreover, the algorithm predicts human genes reasonably well when the homologous protein is nonvertebrate or even prokaryotic. The surprisingly good performance of the method was confirmed by extensive simulations: in particular, with target proteins at 160 accepted point mutations (PAM) (25% similarity), the correlation between the predicted and actual genes was still as high as 95%.

摘要

基因识别是计算分子生物学中最重要的问题之一。以往解决该问题的尝试基于统计学,而组合方法在基因识别中的应用几乎未被探索。大规模cDNA测序的最新进展为基因识别开辟了一条新途径,即利用先前测序的基因作为识别新测序基因的线索。本文描述了一种剪接比对算法和软件工具,该工具能在多项式时间内探索所有可能的外显子组合,并找到与相关蛋白质拟合度最佳的多外显子结构。与其他现有方法不同,即使在存在短外显子或密码子使用异常的外显子的情况下,该算法也能成功识别基因;我们还报告了具有10个以上外显子的基因的正确组合。在具有已知哺乳动物亲缘关系的人类基因测试样本中,预测蛋白质与实际蛋白质之间的平均相关性为99%。该算法正确重建了87%的基因,预测的外显子-内含子结构与实际结构之间罕见的差异是由短(少于5个氨基酸)的起始/末端外显子或可变剪接引起的。此外,当同源蛋白质是非脊椎动物甚至是原核生物时,该算法对人类基因的预测也相当不错。大量模拟证实了该方法令人惊讶的良好性能:特别是,当目标蛋白质有160个接受点突变(PAM)(25%的相似度)时,预测基因与实际基因之间的相关性仍高达95%。