生物数据的自组织和自校正分类

Self-organizing and self-correcting classifications of biological data.

作者信息

Garrity George M, Lilburn Timothy G

机构信息

Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA.

出版信息

Bioinformatics. 2005 May 15;21(10):2309-14. doi: 10.1093/bioinformatics/bti346. Epub 2005 Feb 24.

DOI:10.1093/bioinformatics/bti346

PMID:15731209

Abstract

MOTIVATION

Rapid, automated means of organizing biological data are required if we hope to keep abreast of the flood of data emanating from sequencing, microarray and similar high-throughput analyses. Faced with the need to validate the annotation of thousands of sequences and to generate biologically meaningful classifications based on the sequence data, we turned to statistical methods in order to automate these processes.

RESULTS

An algorithm for automated classification based on evolutionary distance data was written in S. The algorithm was tested on a dataset of 1436 small subunit ribosomal RNA sequences and was able to classify the sequences according to an extant scheme, use statistical measurements of group membership to detect sequences that were misclassified within this scheme and produce a new classification. In this study, the use of the algorithm to address problems in prokaryotic taxonomy is discussed.

AVAILABILITY

S-Plus is available from Insightful, Inc. An S-Plus implementation of the algorithm and the associated data are available at http://taxoweb.mmg.msu.edu/datasets

摘要

动机

如果我们希望跟上测序、微阵列及类似高通量分析所产生的海量数据，就需要快速、自动化的生物数据组织方法。面对验证数千个序列注释以及基于序列数据生成具有生物学意义分类的需求，我们求助于统计方法以实现这些过程的自动化。

结果

用S语言编写了一种基于进化距离数据的自动分类算法。该算法在一个包含1436个小亚基核糖体RNA序列的数据集上进行了测试，能够根据现有分类方案对序列进行分类，利用组成员关系的统计测量来检测该方案中分类错误的序列，并生成一个新的分类。在本研究中，讨论了使用该算法解决原核生物分类学问题的情况。

可用性

S-Plus可从Insightful公司获得。该算法的S-Plus实现及相关数据可在http://taxoweb.mmg.msu.edu/datasets获取。

相似文献

Self-organizing and self-correcting classifications of biological data.

Bioinformatics. 2005 May 15;21(10):2309-14. doi: 10.1093/bioinformatics/bti346. Epub 2005 Feb 24.

GeneTools--application for functional annotation and statistical hypothesis testing.

BMC Bioinformatics. 2006 Oct 24;7:470. doi: 10.1186/1471-2105-7-470.

Mining frequent stem patterns from unaligned RNA sequences.

Bioinformatics. 2006 Oct 15;22(20):2480-7. doi: 10.1093/bioinformatics/btl431. Epub 2006 Aug 14.

Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data.

Bioinformatics. 2007 Sep 1;23(17):2247-55. doi: 10.1093/bioinformatics/btm320. Epub 2007 Jun 27.

Classification based upon gene expression data: bias and precision of error rates.

Bioinformatics. 2007 Jun 1;23(11):1363-70. doi: 10.1093/bioinformatics/btm117. Epub 2007 Mar 28.

goCluster integrates statistical analysis and functional interpretation of microarray expression data.

Bioinformatics. 2005 Sep 1;21(17):3575-7. doi: 10.1093/bioinformatics/bti574. Epub 2005 Jul 14.

A Grid-based solution for management and analysis of microarrays in distributed experiments.

BMC Bioinformatics. 2007 Mar 8;8 Suppl 1(Suppl 1):S7. doi: 10.1186/1471-2105-8-S1-S7.

What should be expected from feature selection in small-sample settings.

Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.

FINE: fisher information nonparametric embedding.

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2093-8. doi: 10.1109/TPAMI.2009.67.

Enhancing instance-based classification with local density: a new algorithm for classifying unbalanced biomedical data.

Bioinformatics. 2006 Apr 15;22(8):981-8. doi: 10.1093/bioinformatics/btl027. Epub 2006 Jan 27.

引用本文的文献

A New Genomics-Driven Taxonomy of Bacteria and Archaea: Are We There Yet?

J Clin Microbiol. 2016 Aug;54(8):1956-63. doi: 10.1128/JCM.00200-16. Epub 2016 May 18.

Biodiversity of Intestinal Lactic Acid Bacteria in the Healthy Population.

Adv Exp Med Biol. 2016;932:1-64. doi: 10.1007/5584_2016_3.

The Ribosomal Database Project: improved alignments and new tools for rRNA analysis.

Nucleic Acids Res. 2009 Jan;37(Database issue):D141-5. doi: 10.1093/nar/gkn879. Epub 2008 Nov 12.

Identification of gene expression patterns using planned linear contrasts.

BMC Bioinformatics. 2006 May 5;7:245. doi: 10.1186/1471-2105-7-245.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生物数据的自组织和自校正分类

Self-organizing and self-correcting classifications of biological data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献