• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过CLOBBing法理解EST序列。

Making sense of EST sequences by CLOBBing them.

作者信息

Parkinson John, Guiliano David B, Blaxter Mark

机构信息

Institute of Cell, Animal and Population Biology, University of Edinburgh, Edinburgh EH9 3JT, UK.

出版信息

BMC Bioinformatics. 2002 Oct 25;3:31. doi: 10.1186/1471-2105-3-31.

DOI:10.1186/1471-2105-3-31
PMID:12398795
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC137596/
Abstract

BACKGROUND

Expressed sequence tags (ESTs) are single pass reads from randomly selected cDNA clones. They provide a highly cost-effective method to access and identify expressed genes. However, they are often prone to sequencing errors and typically define incomplete transcripts. To increase the amount of information obtainable from ESTs and reduce sequencing errors, it is necessary to cluster ESTs into groups sharing significant sequence similarity.

RESULTS

As part of our ongoing EST programs investigating 'orphan' genomes, we have developed a clustering algorithm, CLOBB (Cluster on the basis of BLAST similarity) to identify and cluster ESTs. CLOBB may be used incrementally, preserving original cluster designations. It tracks cluster-specific events such as merging, identifies 'superclusters' of related clusters and avoids the expansion of chimeric clusters. Based on the Perl scripting language, CLOBB is highly portable relying only on a local installation of NCBI's freely available BLAST executable and can be usefully applied to > 95 % of the current EST datasets. Analysis of the Danio rerio EST dataset demonstrates that CLOBB compares favourably with two less portable systems, UniGene and TIGR Gene Indices.

CONCLUSIONS

CLOBB provides a highly portable EST clustering solution and is freely downloaded from: http://www.nematodes.org/CLOBB

摘要

背景

表达序列标签(ESTs)是从随机选择的cDNA克隆中进行的单通道读取。它们提供了一种极具成本效益的方法来获取和鉴定表达的基因。然而,它们往往容易出现测序错误,并且通常定义的是不完整的转录本。为了增加可从ESTs获得的信息量并减少测序错误,有必要将ESTs聚类成具有显著序列相似性的组。

结果

作为我们正在进行的研究“孤儿”基因组的EST项目的一部分,我们开发了一种聚类算法CLOBB(基于BLAST相似性进行聚类)来识别和聚类ESTs。CLOBB可以增量使用,保留原始的聚类指定。它跟踪特定于聚类的事件,如合并,识别相关聚类的“超级聚类”,并避免嵌合聚类的扩展。基于Perl脚本语言,CLOBB具有高度的可移植性,仅依赖于本地安装的NCBI免费提供的BLAST可执行文件,并且可以有效地应用于超过95%的当前EST数据集。对斑马鱼EST数据集的分析表明,CLOBB与另外两个可移植性较差的系统UniGene和TIGR基因索引相比具有优势。

结论

CLOBB提供了一种高度可移植的EST聚类解决方案,可从以下网址免费下载:http://www.nematodes.org/CLOBB

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8432/137596/57dc985d492c/1471-2105-3-31-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8432/137596/adc24161ff46/1471-2105-3-31-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8432/137596/57dc985d492c/1471-2105-3-31-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8432/137596/adc24161ff46/1471-2105-3-31-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8432/137596/57dc985d492c/1471-2105-3-31-2.jpg

相似文献

1
Making sense of EST sequences by CLOBBing them.通过CLOBBing法理解EST序列。
BMC Bioinformatics. 2002 Oct 25;3:31. doi: 10.1186/1471-2105-3-31.
2
prot4EST: translating expressed sequence tags from neglected genomes.prot4EST:从被忽视的基因组翻译表达序列标签
BMC Bioinformatics. 2004 Nov 30;5:187. doi: 10.1186/1471-2105-5-187.
3
Obtaining accurate translations from expressed sequence tags.从表达序列标签中获取准确的翻译。
Methods Mol Biol. 2009;533:221-39. doi: 10.1007/978-1-60327-136-3_10.
4
[A new method for EST clustering].[一种用于EST聚类的新方法]
Yi Chuan Xue Bao. 2003 Feb;30(2):147-53.
5
galaxieEST: addressing EST identity through automated phylogenetic analysis.星系EST:通过自动系统发育分析确定EST身份
BMC Bioinformatics. 2004 Jul 5;5:87. doi: 10.1186/1471-2105-5-87.
6
ParPEST: a pipeline for EST data analysis based on parallel computing.ParPEST:一种基于并行计算的EST数据分析流程。
BMC Bioinformatics. 2005 Dec 1;6 Suppl 4(Suppl 4):S9. doi: 10.1186/1471-2105-6-S4-S9.
7
EST processing: from trace to sequence.EST处理:从痕量到序列
Methods Mol Biol. 2009;533:189-220. doi: 10.1007/978-1-60327-136-3_9.
8
ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs).针叶树EST数据库:一个用于针叶树表达序列标签(ESTs)数据再处理和挖掘的综合生物信息学系统。
BMC Genomics. 2007 May 29;8:134. doi: 10.1186/1471-2164-8-134.
9
Using ESTs to improve the accuracy of de novo gene prediction.利用表达序列标签提高从头基因预测的准确性。
BMC Bioinformatics. 2006 Jul 3;7:327. doi: 10.1186/1471-2105-7-327.
10
EST analysis pipeline: use of distributed computing resources.EST分析流程:分布式计算资源的使用
Methods Mol Biol. 2011;722:103-20. doi: 10.1007/978-1-61779-040-9_7.

引用本文的文献

1
Codon usage patterns in Nematoda: analysis based on over 25 million codons in thirty-two species.线虫纲的密码子使用模式:基于32个物种中超过2500万个密码子的分析。
Genome Biol. 2006;7(8):R75. doi: 10.1186/gb-2006-7-8-r75.
2
KONAGAbase: a genomic and transcriptomic database for the diamondback moth, Plutella xylostella.KONAGAbase:小菜蛾基因组和转录组数据库。
BMC Genomics. 2013 Jul 9;14:464. doi: 10.1186/1471-2164-14-464.
3
A transcriptomic analysis of Echinococcus granulosus larval stages: implications for parasite biology and host adaptation.

本文引用的文献

1
200000 nematode expressed sequence tags on the Net.网络上有200000个线虫表达序列标签。
Trends Parasitol. 2001 Aug 1;17(8):394-396. doi: 10.1016/s1471-4922(01)01954-7.
2
An optimized protocol for analysis of EST sequences.一种用于表达序列标签(EST)序列分析的优化方案。
Nucleic Acids Res. 2000 Sep 15;28(18):3657-65. doi: 10.1093/nar/28.18.3657.
3
JESAM: CORBA software components to create and publish EST alignments and clusters.JESAM:用于创建和发布EST比对及聚类的CORBA软件组件。
棘球蚴幼虫转录组分析:对寄生虫生物学和宿主适应的启示。
PLoS Negl Trop Dis. 2012;6(11):e1897. doi: 10.1371/journal.pntd.0001897. Epub 2012 Nov 29.
4
A molecular analysis of desiccation tolerance mechanisms in the anhydrobiotic nematode Panagrolaimus superbus using expressed sequenced tags.利用表达序列标签对脱水生物线虫华丽帕纳格罗线虫的耐旱机制进行分子分析。
BMC Res Notes. 2012 Jan 26;5:68. doi: 10.1186/1756-0500-5-68.
5
Exploiting a wheat EST database to assess genetic diversity.利用小麦 EST 数据库评估遗传多样性。
Genet Mol Biol. 2010 Oct;33(4):719-30. doi: 10.1590/S1415-47572010005000094. Epub 2010 Dec 1.
6
Spliceosomal intron size expansion in domesticated grapevine (Vitis vinifera).驯化葡萄(葡萄属酿酒葡萄)中剪接体内含子大小的扩展。
BMC Res Notes. 2011 Mar 8;4:52. doi: 10.1186/1756-0500-4-52.
7
Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak.基于 Sanger 和焦磷酸测序方法的 EST 数据的生物信息学分析:栎树。
BMC Genomics. 2010 Nov 23;11:650. doi: 10.1186/1471-2164-11-650.
8
Quantitative gene expression profiles in real time from expressed sequence tag databases.来自表达序列标签数据库的实时定量基因表达谱。
Gene Expr. 2010;14(6):321-36. doi: 10.3727/105221610x12717040569820.
9
PEACE: Parallel Environment for Assembly and Clustering of Gene Expression.PEACE:基因表达组装和聚类的并行环境。
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W737-42. doi: 10.1093/nar/gkq470. Epub 2010 Jun 3.
10
Survey of transcripts expressed by the invasive juvenile stage of the liver fluke Fasciola hepatica.肝片形吸虫侵袭性幼虫阶段表达的转录本调查。
BMC Genomics. 2010 Apr 7;11:227. doi: 10.1186/1471-2164-11-227.
Bioinformatics. 2000 Apr;16(4):313-25. doi: 10.1093/bioinformatics/16.4.313.
4
d2_cluster: a validated method for clustering EST and full-length cDNAsequences.d2聚类:一种用于对EST和全长cDNA序列进行聚类的有效方法。
Genome Res. 1999 Nov;9(11):1135-42. doi: 10.1101/gr.9.11.1135.
5
CAP3: A DNA sequence assembly program.CAP3:一个DNA序列组装程序。
Genome Res. 1999 Sep;9(9):868-77. doi: 10.1101/gr.9.9.868.
6
Automated clustering and assembly of large EST collections.大型EST文库的自动聚类与组装
Proc Int Conf Intell Syst Mol Biol. 1998;6:203-11.
7
Evolutionary relationships among proteins probed by an iterative neighborhood cluster analysis (INCA). Alignment of bacteriorhodopsins with the yeast sequence YRO2.通过迭代邻域聚类分析(INCA)探究蛋白质之间的进化关系。细菌视紫红质与酵母序列YRO2的比对。
Pharm Res. 1997 Nov;14(11):1533-41. doi: 10.1023/a:1012166015402.
8
SEALS: a system for easy analysis of lots of sequences.SEALS:一个用于轻松分析大量序列的系统。
Proc Int Conf Intell Syst Mol Biol. 1997;5:333-9.
9
A new dynamic tool to perform assembly of expressed sequence tags (ESTs).一种用于进行表达序列标签(ESTs)组装的新型动态工具。
Comput Appl Biosci. 1997 Aug;13(4):453-7. doi: 10.1093/bioinformatics/13.4.453.
10
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.空位BLAST和位置特异性迭代BLAST:新一代蛋白质数据库搜索程序。
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402. doi: 10.1093/nar/25.17.3389.