Suppr超能文献

通过基于序列的矩阵格式和关联规则集实现基因组数据挖掘自动化。

Automating genomic data mining via a sequence-based matrix format and associative rule set.

作者信息

Wren Jonathan D, Johnson David, Gruenwald Le

机构信息

Advanced Center for Genome Technology, Department of Botany and Microbiology, 101 David L, Boren Blvd, Rm 2025.

出版信息

BMC Bioinformatics. 2005 Jul 15;6 Suppl 2(Suppl 2):S2. doi: 10.1186/1471-2105-6-S2-S2.

Abstract

There is an enormous amount of information encoded in each genome--enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands.

摘要

每个基因组中都编码了大量信息——足以创造出有生命、有反应和适应性的生物体。仅原始序列数据不足以理解功能、机制或相互作用。单个碱基对的变化可能导致疾病,如镰状细胞贫血,而一些大的兆碱基缺失却没有明显的表型效应。基因组特征的数据类型各不相同,并且这些特征的注释分布在多个数据库中。在此,我们开发了一种方法,通过迭代探索序列数据以寻找相关性并在此基础上进行构建,从而自动探索基因组。首先,为了整合和比较不同的注释来源,开发了一个序列矩阵(SM)来包含位置相关信息。其次,为矩阵行类型开发了一个分类树,指定了为分析目的每种数据类型相对于其他数据类型应如何处理。第三,开发了相关分析,以根据分类树指导的其他行来分析每个矩阵行的特征,确定哪些分析是合适的。开发了一个原型,并成功检测到基因、外显子、重复元件和CpG岛之间一致的基因组特征。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b9f/1637034/c1aece4e3f27/1471-2105-6-S2-S2-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验