Lin Feng-Mao, Huang Hsien-Da, Chang Yu-Chung, Horng Jorng-Tzong
Department of Computer Science and Information Engineering National Central University, Chung-Li 320, Taiwan.
BMC Genomics. 2004 Oct 9;5:78. doi: 10.1186/1471-2164-5-78.
Information on the occurrence of sequence features in genomes is crucial to comparative genomics, evolutionary analysis, the analyses of regulatory sequences and the quantitative evaluation of sequences. Computing the frequencies and the occurrences of a pattern in complete genomes is time-consuming.
The proposed database provides information about sequence features generated by exhaustively computing the sequences of the complete genome. The repetitive elements in the eukaryotic genomes, such as LINEs, SINEs, Alu and LTR, are obtained from Repbase. The database supports various complete genomes including human, yeast, worm, and 128 microbial genomes.
This investigation presents and implements an efficiently computational approach to accumulate the occurrences of the oligonucleotides or patterns in complete genomes. A database is established to maintain the information of the sequence features, including the distributions of oligonucleotide, the gene distribution, the distribution of repetitive elements in genomes and the occurrences of the oligonucleotides. The database can provide more effective and efficient way to access the repetitive features in genomes.
基因组中序列特征出现情况的信息对于比较基因组学、进化分析、调控序列分析以及序列的定量评估至关重要。计算完整基因组中模式的频率和出现情况非常耗时。
所提出的数据库提供了通过详尽计算完整基因组序列而生成的序列特征信息。真核生物基因组中的重复元件,如长散在核元件(LINEs)、短散在核元件(SINEs)、Alu元件和长末端重复序列(LTR),是从Repbase获取的。该数据库支持包括人类、酵母、蠕虫和128个微生物基因组在内的各种完整基因组。
本研究提出并实施了一种高效的计算方法,用于积累完整基因组中寡核苷酸或模式的出现情况。建立了一个数据库来维护序列特征信息,包括寡核苷酸的分布、基因分布、基因组中重复元件的分布以及寡核苷酸的出现情况。该数据库能够提供更有效且高效的方式来获取基因组中的重复特征。