• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

蛋白质序列到基因组序列的直接映射与比对。

Direct mapping and alignment of protein sequences onto genomic sequence.

作者信息

Gotoh Osamu

机构信息

Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto 606-8501, Japan.

出版信息

Bioinformatics. 2008 Nov 1;24(21):2438-44. doi: 10.1093/bioinformatics/btn460. Epub 2008 Aug 26.

DOI:10.1093/bioinformatics/btn460
PMID:18728043
Abstract

MOTIVATION

Finding protein-coding genes in a newly determined genomic sequence is the first step toward understanding the content written in the genome. Sequences of transcripts of homologous genes, if available, can considerably improve accuracy of prediction of genes and their structures, compared with that without such knowledge. As protein sequences are generally better conserved than nucleotide sequences, remote homologs can be used as templates, extending the applicability of evidence-based gene recognition methods. However, no tool seems to have been developed so far to simultaneously map and align a number of protein sequences on mammalian-sized genomic sequence.

RESULTS

We have extended our computer program Spaln to accept protein sequences, as well as cDNA sequences, as queries. When the query and the target sequences are reasonably similar, e.g. between mammalian orthologs, Spaln runs one to two orders of magnitude faster than conventional approaches that rely on Blast search followed by dynamic-programming-based spliced alignment. Exon-level and gene-level accuracies of Spaln are significantly higher than those obtained by the best available methods of the same type, particularly when the query and the target are distantly related.

AVAILABILITY

Spaln is accessible online for a few species at http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user. The source code is available for free for academic users from the same site.

摘要

动机

在新测定的基因组序列中寻找蛋白质编码基因是理解基因组所蕴含信息的第一步。如果有同源基因的转录本序列,与没有此类信息的情况相比,它能显著提高基因及其结构预测的准确性。由于蛋白质序列通常比核苷酸序列保守性更好,远缘同源物可用作模板,从而扩展基于证据的基因识别方法的适用性。然而,到目前为止,似乎还没有开发出一种工具能够在哺乳动物大小的基因组序列上同时对多个蛋白质序列进行定位和比对。

结果

我们已对计算机程序Spaln进行扩展,使其能够接受蛋白质序列以及cDNA序列作为查询序列。当查询序列和目标序列相似度合理时,例如哺乳动物直系同源物之间,Spaln的运行速度比依赖于Blast搜索后进行基于动态规划的剪接比对的传统方法快一到两个数量级。Spaln在外显子水平和基因水平的准确性显著高于同类最佳可用方法,尤其是当查询序列和目标序列关系较远时。

可用性

可通过http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user在线访问针对少数物种的Spaln。学术用户可从同一网站免费获取源代码。

相似文献

1
Direct mapping and alignment of protein sequences onto genomic sequence.蛋白质序列到基因组序列的直接映射与比对。
Bioinformatics. 2008 Nov 1;24(21):2438-44. doi: 10.1093/bioinformatics/btn460. Epub 2008 Aug 26.
2
transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences.transAlign:利用氨基酸促进蛋白质编码DNA序列的多重比对。
BMC Bioinformatics. 2005 Jun 22;6:156. doi: 10.1186/1471-2105-6-156.
3
Fast model-based protein homology detection without alignment.基于快速模型的无需比对的蛋白质同源性检测。
Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.
4
WebScipio: an online tool for the determination of gene structures using protein sequences.WebScipio:一种利用蛋白质序列确定基因结构的在线工具。
BMC Genomics. 2008 Sep 18;9:422. doi: 10.1186/1471-2164-9-422.
5
Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.通过基因组DNA与蛋白质序列的剪接比对进行基因结构预测:通过差异剪接位点评分提高准确性。
J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.
6
A tool for analyzing and annotating genomic sequences.一种用于分析和注释基因组序列的工具。
Genomics. 1997 Nov 15;46(1):37-45. doi: 10.1006/geno.1997.4984.
7
Blast sampling for structural and functional analyses.用于结构和功能分析的胚细胞采样。
BMC Bioinformatics. 2007 Feb 23;8:62. doi: 10.1186/1471-2105-8-62.
8
Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment.序列渐进比对,一种用于实际大规模概率一致性比对的框架。
Bioinformatics. 2009 Feb 1;25(3):295-301. doi: 10.1093/bioinformatics/btn630. Epub 2008 Dec 4.
9
An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.一种用于挖掘未比对蛋白质序列中频繁模式的高效、通用且可扩展的模式增长方法。
Bioinformatics. 2007 Mar 15;23(6):687-93. doi: 10.1093/bioinformatics/btl665. Epub 2007 Jan 19.
10
Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre.在Phyre程序中使用集成折叠识别方法探索序列/结构空间的极限。
Proteins. 2008 Feb 15;70(3):611-25. doi: 10.1002/prot.21688.

引用本文的文献

1
Efficient evidence-based genome annotation with EviAnn.使用EviAnn进行高效的基于证据的基因组注释。
bioRxiv. 2025 May 12:2025.05.07.652745. doi: 10.1101/2025.05.07.652745.
2
Museomics of an extinct European flat oyster population.一个已灭绝的欧洲扁牡蛎种群的博物馆组学研究
Sci Rep. 2025 Apr 22;15(1):13906. doi: 10.1038/s41598-025-96743-8.
3
Spaln3: improvement in speed and accuracy of genome mapping and spliced alignment of protein query sequences.Spaln3:提高基因组作图和蛋白质查询序列拼接比对的速度和准确性。
Bioinformatics. 2024 Aug 2;40(8). doi: 10.1093/bioinformatics/btae517.
4
Nanopore-based glycan sequencing: state of the art and future prospects.基于纳米孔的聚糖测序:现状与未来展望。
Chem Sci. 2024 Apr 3;15(17):6229-6243. doi: 10.1039/d4sc01466a. eCollection 2024 May 1.
5
Genome assembly of Genji firefly (Nipponoluciola cruciata) reveals novel luciferase-like luminescent proteins without peroxisome targeting signal.金龟子萤(Nipponoluciola cruciata)基因组组装揭示了新型无过氧化物酶体靶向信号的荧光素酶样发光蛋白。
DNA Res. 2024 Apr 1;31(2). doi: 10.1093/dnares/dsae006.
6
Galba: genome annotation with miniprot and AUGUSTUS.Galba:使用 miniprot 和 AUGUSTUS 进行基因组注释。
BMC Bioinformatics. 2023 Aug 31;24(1):327. doi: 10.1186/s12859-023-05449-z.
7
Environmental gradients reveal stress hubs pre-dating plant terrestrialization.环境梯度揭示了植物登陆前的压力枢纽。
Nat Plants. 2023 Sep;9(9):1419-1438. doi: 10.1038/s41477-023-01491-0. Epub 2023 Aug 28.
8
Comparative Genomic Analysis of Warthog and Sus Scrofa Identifies Adaptive Genes Associated with African Swine Fever.疣猪与野猪的比较基因组分析确定了与非洲猪瘟相关的适应性基因。
Biology (Basel). 2023 Jul 14;12(7):1001. doi: 10.3390/biology12071001.
9
GALBA: Genome Annotation with Miniprot and AUGUSTUS.GALBA:使用Miniprot和AUGUSTUS进行基因组注释。
bioRxiv. 2023 Apr 10:2023.04.10.536199. doi: 10.1101/2023.04.10.536199.
10
Protein-to-genome alignment with miniprot.用 Miniprot 进行蛋白质到基因组的比对。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad014.