Suppr超能文献

一种通过BLSOM从大型基因组序列数据中进行高效知识发现的新型生物信息学方法。

A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

作者信息

Bai Yu, Iwasaki Yuki, Kanaya Shigehiko, Zhao Yue, Ikemura Toshimichi

机构信息

Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma-shi, Nara 630-0192, Japan.

Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Shiga-ken 526-0829, Japan.

出版信息

Biomed Res Int. 2014;2014:765648. doi: 10.1155/2014/765648. Epub 2014 Apr 3.

Abstract

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

摘要

随着各种物种基因组序列数据的显著增加,需要新的工具来对大量序列数据进行全面分析。自组织映射(SOM)是一种有效的工具,可用于在一张图上对高维数据(如寡核苷酸组成)进行聚类和可视化。通过对传统SOM进行改进,我们之前开发了批学习SOM(BLSOM),它仅根据寡核苷酸组成就能根据物种对序列片段进行分类。在本研究中,我们介绍了用于表征脊椎动物基因组序列的寡核苷酸BLSOM。我们首先分析了来自各种脊椎动物基因组的100 kb序列中的五核苷酸组成,然后分析了人类和小鼠基因组中的组成,以研究检测密切相关基因组之间差异的有效方法。BLSOM可以识别每个基因组中寡核苷酸频率的物种特异性关键组合,即“基因组特征”,以及转录因子结合序列特异性富集的特定区域。由于分类和可视化能力非常高,BLSOM是从大量基因组序列(即大量序列数据)中提取广泛信息的高效强大工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0921/3996302/7284845131c2/BMRI2014-765648.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验