• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因捕获序列的基因组定位方法比较

Comparison of methods for genomic localization of gene trap sequences.

作者信息

Harper Courtney A, Huang Conrad C, Stryke Doug, Kawamoto Michiko, Ferrin Thomas E, Babbitt Patricia C

机构信息

Department of Biopharmaceutical Sciences, University of California San Francisco, 1700 4th Street, San Francisco, CA 94143-2250, USA.

出版信息

BMC Genomics. 2006 Sep 18;7:236. doi: 10.1186/1471-2164-7-236.

DOI:10.1186/1471-2164-7-236
PMID:16982004
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1617135/
Abstract

BACKGROUND

Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences) were used to evaluate localization results.

RESULTS

In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes.

CONCLUSION

The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.

摘要

背景

在诸如小鼠这样的模式生物中进行基因敲除,为基础生物学和人类疾病的研究提供了宝贵资源。确定因非靶向基因捕获事件而失活的基因是一个具有挑战性的注释问题,因为基因捕获序列标签代表被捕获基因载体插入位点附近的序列,通常较短且常常包含未解析的残基。为了更好地理解这些序列在小鼠基因组上的定位,我们比较了比对程序BLAT、SSAHA和MegaBLAST的独立版本。使用每种算法的默认参数,将一组3369个序列标签与小鼠基因组的34构建体进行比对。使用全长基因同源集(1659个序列)的已知基因组坐标来评估定位结果。

结果

总体而言,所有这三个程序在将序列定位到基因组的大致区域方面表现良好,仅一小部分序列标签存在相对细微的错误。然而,在正确识别外显子边界方面注意到性能存在很大差异。BLAT正确识别了绝大多数外显子边界,而SSAHA和MegaBLAST则错过了大多数外显子边界。SSAHA始终报告的假阳性最少,并且是最快的算法。MegaBLAST在速度上与BLAT相当,但最容易将序列标签错误地定位到假基因上。

结论

序列标签和全长参考序列在性能上的差异小得出奇。注意到每个程序在定位结果上的特征性差异,特别是这些差异影响了外显子边界处序列的定位。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/1c6726a9df91/1471-2164-7-236-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/e850230d7090/1471-2164-7-236-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/b08a7376ed7f/1471-2164-7-236-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/3a1bc8354fee/1471-2164-7-236-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/1c6726a9df91/1471-2164-7-236-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/e850230d7090/1471-2164-7-236-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/b08a7376ed7f/1471-2164-7-236-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/3a1bc8354fee/1471-2164-7-236-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ec44/1617135/1c6726a9df91/1471-2164-7-236-4.jpg

相似文献

1
Comparison of methods for genomic localization of gene trap sequences.基因捕获序列的基因组定位方法比较
BMC Genomics. 2006 Sep 18;7:236. doi: 10.1186/1471-2164-7-236.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans.人类中外显子重复、外显子重排和反式剪接的生物信息学分析。
Bioinformatics. 2006 Mar 15;22(6):692-8. doi: 10.1093/bioinformatics/bti795. Epub 2005 Nov 24.
4
Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring.通过基因组DNA与蛋白质序列的剪接比对进行基因结构预测:通过差异剪接位点评分提高准确性。
J Mol Biol. 2000 Apr 14;297(5):1075-85. doi: 10.1006/jmbi.2000.3641.
5
Fast and sensitive algorithm for aligning ESTs to human genome.用于将EST序列与人类基因组进行比对的快速灵敏算法。
Proc IEEE Comput Soc Bioinform Conf. 2002;1:43-53.
6
A fast and sensitive algorithm for aligning ESTs to the human genome.一种用于将EST序列与人类基因组进行比对的快速且灵敏的算法。
J Bioinform Comput Biol. 2003 Jul;1(2):363-86. doi: 10.1142/s0219720003000058.
7
Using BLAT to find sequence similarity in closely related genomes.使用BLAT在亲缘关系密切的基因组中寻找序列相似性。
Curr Protoc Bioinformatics. 2012 Mar;Chapter 10:10.8.1-10.8.24. doi: 10.1002/0471250953.bi1008s37.
8
pblat: a multithread blat algorithm speeding up aligning sequences to genomes.pblat:一种多线程 blat 算法,用于加速将序列与基因组对齐。
BMC Bioinformatics. 2019 Jan 15;20(1):28. doi: 10.1186/s12859-019-2597-8.
9
BLAT--the BLAST-like alignment tool.BLAT——类BLAST比对工具。
Genome Res. 2002 Apr;12(4):656-64. doi: 10.1101/gr.229202.
10
Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation.提高比对灵敏度可改善基因组比对在比较基因注释中的应用。
Nucleic Acids Res. 2017 Aug 21;45(14):8369-8377. doi: 10.1093/nar/gkx554.

引用本文的文献

1
Using BLAT to find sequence similarity in closely related genomes.使用BLAT在亲缘关系密切的基因组中寻找序列相似性。
Curr Protoc Bioinformatics. 2012 Mar;Chapter 10:10.8.1-10.8.24. doi: 10.1002/0471250953.bi1008s37.

本文引用的文献

1
The International Gene Trap Consortium Website: a portal to all publicly available gene trap cell lines in mouse.国际基因捕获联盟网站:小鼠所有公开可用基因捕获细胞系的入口。
Nucleic Acids Res. 2006 Jan 1;34(Database issue):D642-8. doi: 10.1093/nar/gkj097.
2
Automated generation of heuristics for biological sequence comparison.用于生物序列比较的启发式方法的自动生成。
BMC Bioinformatics. 2005 Feb 15;6:31. doi: 10.1186/1471-2105-6-31.
3
A public gene trap resource for mouse functional genomics.用于小鼠功能基因组学的公共基因捕获资源。
Nat Genet. 2004 Jun;36(6):543-4. doi: 10.1038/ng0604-543.
4
The Ensembl Web site: mechanics of a genome browser.Ensembl网站:基因组浏览器的运行机制
Genome Res. 2004 May;14(5):951-5. doi: 10.1101/gr.1863004.
5
BayGenomics: a resource of insertional mutations in mouse embryonic stem cells.海湾基因组学:小鼠胚胎干细胞插入突变资源库。
Nucleic Acids Res. 2003 Jan 1;31(1):278-81. doi: 10.1093/nar/gkg064.
6
The UCSC Genome Browser Database.加州大学圣克鲁兹分校基因组浏览器数据库。
Nucleic Acids Res. 2003 Jan 1;31(1):51-4. doi: 10.1093/nar/gkg129.
7
Initial sequencing and comparative analysis of the mouse genome.小鼠基因组的初步测序与比较分析。
Nature. 2002 Dec 5;420(6915):520-62. doi: 10.1038/nature01262.
8
BLAT--the BLAST-like alignment tool.BLAT——类BLAST比对工具。
Genome Res. 2002 Apr;12(4):656-64. doi: 10.1101/gr.229202.
9
SSAHA: a fast search method for large DNA databases.SSAHA:一种用于大型DNA数据库的快速搜索方法。
Genome Res. 2001 Oct;11(10):1725-9. doi: 10.1101/gr.194201.
10
Gene-trap mutagenesis: past, present and beyond.基因捕获诱变:过去、现在及未来。
Nat Rev Genet. 2001 Oct;2(10):756-68. doi: 10.1038/35093548.