EST聚类错误评估与校正。

EST clustering error evaluation and correction.

作者信息

Wang Ji-Ping Z, Lindsay Bruce G, Leebens-Mack James, Cui Liying, Wall Kerr, Miller Webb C, dePamphilis Claude W

机构信息

Department of Statistics, Northwestern University, Evanston, IL 60208, USA.

出版信息

Bioinformatics. 2004 Nov 22;20(17):2973-84. doi: 10.1093/bioinformatics/bth342. Epub 2004 Jun 9.

DOI:10.1093/bioinformatics/bth342

PMID:15189818

Abstract

MOTIVATION

The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated.

RESULTS

We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.

摘要

动机

由表达序列标签（EST）数据传达的基因表达强度信息可用于推断重要的cDNA文库特性，如基因数量和表达模式。然而，EST聚类错误经常导致对获得的独特基因的估计大幅膨胀，已成为分析中的主要障碍。需要系统地研究EST聚类错误结构、聚类错误与聚类标准之间的关系以及可能的错误校正方法。

结果

我们使用CAP3组装程序识别并量化了EST聚类中的两种错误类型，即I型和II型。当来自同一基因的EST未形成一个聚类时发生I型错误，而当来自不同基因的EST被错误地聚类在一起时发生II型错误。虽然5'和3' EST聚类的II型错误率均<1.5%，但5' EST情况下的I型错误比3' EST情况高约10倍（30%对3%）。过于严格的同一性规则，例如P >= 95%，甚至可能在两种情况下都使I型错误膨胀。我们证明，在5' EST聚类中，约80%的I型错误是由于同级EST之间重叠不足（ISO错误）。提出了一种新颖的统计方法来校正ISO错误，以提供对真实基因聚类概况更准确的估计。

相似文献

EST clustering error evaluation and correction.

Bioinformatics. 2004 Nov 22;20(17):2973-84. doi: 10.1093/bioinformatics/bth342. Epub 2004 Jun 9.

Efficient clustering of large EST data sets on parallel computers.

Nucleic Acids Res. 2003 Jun 1;31(11):2963-74. doi: 10.1093/nar/gkg379.

Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus.

Bioinformatics. 2004 May 1;20(7):1157-69. doi: 10.1093/bioinformatics/bth058. Epub 2004 Feb 5.

Estimating and comparing the rates of gene discovery and expressed sequence tag (EST) frequencies in EST surveys.

Bioinformatics. 2004 Sep 22;20(14):2279-87. doi: 10.1093/bioinformatics/bth239. Epub 2004 Apr 1.

RBR: library-less repeat detection for ESTs.

Bioinformatics. 2006 Sep 15;22(18):2232-6. doi: 10.1093/bioinformatics/btl368. Epub 2006 Jul 12.

EasyCluster: a fast and efficient gene-oriented clustering tool for large-scale transcriptome data.

BMC Bioinformatics. 2009 Jun 16;10 Suppl 6(Suppl 6):S10. doi: 10.1186/1471-2105-10-S6-S10.

Gene capture prediction and overlap estimation in EST sequencing from one or multiple libraries.

BMC Bioinformatics. 2005 Dec 13;6:300. doi: 10.1186/1471-2105-6-300.

[A new method for EST clustering].

Yi Chuan Xue Bao. 2003 Feb;30(2):147-53.

Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome.

BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S9. doi: 10.1186/1471-2105-8-S1-S9.

TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets.

Bioinformatics. 2003 Mar 22;19(5):651-2. doi: 10.1093/bioinformatics/btg034.

引用本文的文献

Comparative transcriptome sequencing analysis of female and male .

PeerJ. 2022 Nov 8;10:e14342. doi: 10.7717/peerj.14342. eCollection 2022.

Comparative transcriptomic analysis for identification of candidate sex-related genes and pathways in Crimson seabream (Parargyrops edita).

Sci Rep. 2021 Jan 13;11(1):1077. doi: 10.1038/s41598-020-80282-5.

Defense responses of lentil (Lens culinaris) genotypes carrying non-allelic ascochyta blight resistance genes to Ascochyta lentis infection.

PLoS One. 2018 Sep 20;13(9):e0204124. doi: 10.1371/journal.pone.0204124. eCollection 2018.

Improving transcriptome de novo assembly by using a reference genome of a related species: Translational genomics from oil palm to coconut.

PLoS One. 2017 Mar 23;12(3):e0173300. doi: 10.1371/journal.pone.0173300. eCollection 2017.

Transcriptome analysis of sika deer in China.

Mol Genet Genomics. 2016 Oct;291(5):1941-53. doi: 10.1007/s00438-016-1231-y. Epub 2016 Jul 16.

Transcriptome Analysis for Identification of Genes Related to Gonad Differentiation, Growth, Immune Response and Marker Discovery in The Turbot (Scophthalmus maximus).

PLoS One. 2016 Feb 29;11(2):e0149414. doi: 10.1371/journal.pone.0149414. eCollection 2016.

Clam focal and systemic immune responses to QPX infection revealed by RNA-seq technology.

BMC Genomics. 2016 Feb 27;17:146. doi: 10.1186/s12864-016-2493-9.

Selecting Superior De Novo Transcriptome Assemblies: Lessons Learned by Leveraging the Best Plant Genome.

PLoS One. 2016 Jan 5;11(1):e0146062. doi: 10.1371/journal.pone.0146062. eCollection 2016.

De novo RNA-Seq analysis of the venus clam, Cyclina sinensis, and the identification of immune-related genes.

PLoS One. 2015 Apr 8;10(4):e0123296. doi: 10.1371/journal.pone.0123296. eCollection 2015.

EasyCluster2: an improved tool for clustering and assembling long transcriptome reads.

BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S7. doi: 10.1186/1471-2105-15-S15-S7. Epub 2014 Dec 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

EST聚类错误评估与校正。

EST clustering error evaluation and correction.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献