• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于大型基因组序列分析和注释的复杂度降低算法。

A complexity reduction algorithm for analysis and annotation of large genomic sequences.

作者信息

Chuang Trees-Juen, Lin Wen-Chang, Lee Hurng-Chun, Wang Chi-Wei, Hsiao Keh-Lin, Wang Zi-Hao, Shieh Danny, Lin Simon C, Ch'ang Lan-Yang

机构信息

Bioinformatics Research Center, Institute of Biomedical Sciences, Academia Sinica, Taipei 11529, Taiwan.

出版信息

Genome Res. 2003 Feb;13(2):313-22. doi: 10.1101/gr.313703.

DOI:10.1101/gr.313703
PMID:12566410
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC420370/
Abstract

DNA is a universal language encrypted with biological instruction for life. In higher organisms, the genetic information is preserved predominantly in an organized exon/intron structure. When a gene is expressed, the exons are spliced together to form the transcript for protein synthesis. We have developed a complexity reduction algorithm for sequence analysis (CRASA) that enables direct alignment of cDNA sequences to the genome. This method features a progressive data structure in hierarchical orders to facilitate a fast and efficient search mechanism. CRASA implementation was tested with already annotated genomic sequences in two benchmark data sets and compared with 15 annotation programs (10 ab initio and 5 homology-based approaches) against the EST database. By the use of layered noise filters, the complexity of CRASA-matched data was reduced exponentially. The results from the benchmark tests showed that CRASA annotation excelled in both the sensitivity and specificity categories. When CRASA was applied to the analysis of human Chromosomes 21 and 22, an additional 83 potential genes were identified. With its large-scale processing capability, CRASA can be used as a robust tool for genome annotation with high accuracy by matching the EST sequences precisely to the genomic sequences.

摘要

DNA是一种用生命的生物学指令加密的通用语言。在高等生物中,遗传信息主要保存在有组织的外显子/内含子结构中。当一个基因表达时,外显子会拼接在一起形成用于蛋白质合成的转录本。我们开发了一种用于序列分析的复杂度降低算法(CRASA),该算法能够将cDNA序列直接与基因组进行比对。这种方法具有分层顺序的渐进数据结构,以促进快速高效的搜索机制。使用两个基准数据集中已注释的基因组序列对CRASA的实现进行了测试,并与针对EST数据库的15个注释程序(10个从头开始的方法和5个基于同源性的方法)进行了比较。通过使用分层噪声滤波器,CRASA匹配数据的复杂度呈指数级降低。基准测试结果表明,CRASA注释在敏感性和特异性类别方面均表现出色。当将CRASA应用于人类21号和22号染色体的分析时,又鉴定出了83个潜在基因。凭借其大规模处理能力,通过将EST序列与基因组序列精确匹配,CRASA可以用作一种高精度的强大基因组注释工具。

相似文献

1
A complexity reduction algorithm for analysis and annotation of large genomic sequences.一种用于大型基因组序列分析和注释的复杂度降低算法。
Genome Res. 2003 Feb;13(2):313-22. doi: 10.1101/gr.313703.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Fast and sensitive algorithm for aligning ESTs to human genome.用于将EST序列与人类基因组进行比对的快速灵敏算法。
Proc IEEE Comput Soc Bioinform Conf. 2002;1:43-53.
4
Efficient filtering methods for clustering cDNAs with spliced sequence alignment.用于通过剪接序列比对对cDNA进行聚类的高效过滤方法。
Bioinformatics. 2004 Jan 1;20(1):29-39. doi: 10.1093/bioinformatics/btg367.
5
A fast and sensitive algorithm for aligning ESTs to the human genome.一种用于将EST序列与人类基因组进行比对的快速且灵敏的算法。
J Bioinform Comput Biol. 2003 Jul;1(2):363-86. doi: 10.1142/s0219720003000058.
6
Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome.来自ESTs的基因模型(基因模型EST):在番茄基因组上的应用
BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-8-S1-S9.
7
Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.通过基因组DNA与蛋白质序列的剪接比对进行基因结构预测:通过差异剪接位点评分提高准确性。
J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.
8
A tool for analyzing and annotating genomic sequences.一种用于分析和注释基因组序列的工具。
Genomics. 1997 Nov 15;46(1):37-45. doi: 10.1006/geno.1997.4984.
9
A new approach for gene annotation using unambiguous sequence joining.一种使用明确序列连接进行基因注释的新方法。
Proc IEEE Comput Soc Bioinform Conf. 2003;2:353-62.
10
Identification of novel transcribed sequences on human chromosome 22 by expressed sequence tag mapping.通过表达序列标签定位鉴定人类22号染色体上的新转录序列。
DNA Res. 2001 Feb 28;8(1):1-9. doi: 10.1093/dnares/8.1.1.

引用本文的文献

1
An optimized approach for annotation of large eukaryotic genomic sequences using genetic algorithm.一种使用遗传算法对大型真核生物基因组序列进行注释的优化方法。
BMC Bioinformatics. 2017 Oct 24;18(1):460. doi: 10.1186/s12859-017-1874-7.
2
Plant Gene and Alternatively Spliced Variant Annotator. A plant genome annotation pipeline for rice gene and alternatively spliced variant identification with cross-species expressed sequence tag conservation from seven plant species.植物基因与可变剪接变体注释工具。一种用于水稻基因和可变剪接变体识别的植物基因组注释流程,利用来自七个植物物种的跨物种表达序列标签保守性。
Plant Physiol. 2007 Mar;143(3):1086-95. doi: 10.1104/pp.106.092460. Epub 2007 Jan 12.
3
Identification and evolutionary analysis of novel exons and alternative splicing events using cross-species EST-to-genome comparisons in human, mouse and rat.利用人、小鼠和大鼠跨物种的EST与基因组比对进行新型外显子和可变剪接事件的鉴定及进化分析。
BMC Bioinformatics. 2006 Mar 15;7:136. doi: 10.1186/1471-2105-7-136.

本文引用的文献

1
BLAT--the BLAST-like alignment tool.BLAT——类BLAST比对工具。
Genome Res. 2002 Apr;12(4):656-64. doi: 10.1101/gr.229202.
2
SSAHA: a fast search method for large DNA databases.SSAHA:一种用于大型DNA数据库的快速搜索方法。
Genome Res. 2001 Oct;11(10):1725-9. doi: 10.1101/gr.194201.
3
Computational inference of homologous gene structures in the human genome.人类基因组中同源基因结构的计算推断
Genome Res. 2001 May;11(5):803-16. doi: 10.1101/gr.175701.
4
Initial sequencing and analysis of the human genome.人类基因组的初步测序与分析。
Nature. 2001 Feb 15;409(6822):860-921. doi: 10.1038/35057062.
5
The sequence of the human genome.人类基因组序列。
Science. 2001 Feb 16;291(5507):1304-51. doi: 10.1126/science.1058040.
6
An assessment of gene prediction accuracy in large DNA sequences.大型DNA序列中基因预测准确性的评估。
Genome Res. 2000 Oct;10(10):1631-42. doi: 10.1101/gr.122800.
7
A greedy algorithm for aligning DNA sequences.一种用于比对DNA序列的贪婪算法。
J Comput Biol. 2000 Feb-Apr;7(1-2):203-14. doi: 10.1089/10665270050081478.
8
Gene index analysis of the human genome estimates approximately 120,000 genes.对人类基因组的基因索引分析估计约有120000个基因。
Nat Genet. 2000 Jun;25(2):239-40. doi: 10.1038/76126.
9
Estimate of human gene number provided by genome-wide analysis using Tetraodon nigroviridis DNA sequence.利用黑青斑河豚DNA序列进行全基因组分析得出的人类基因数量估计值。
Nat Genet. 2000 Jun;25(2):235-8. doi: 10.1038/76118.
10
Analysis of expressed sequence tags indicates 35,000 human genes.对表达序列标签的分析表明人类有35000个基因。
Nat Genet. 2000 Jun;25(2):232-4. doi: 10.1038/76115.