Liu Zhi Hua, Jiao Dian, Sun Xiao
State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China.
Genomics Proteomics Bioinformatics. 2005 Nov;3(4):201-5. doi: 10.1016/s1672-0229(05)03027-5.
Traditional sequence analysis depends on sequence alignment. In this study, we analyzed various functional regions of the human genome based on sequence features, including word frequency, dinucleotide relative abundance, and base-base correlation. We analyzed the human chromosome 22 and classified the upstream, exon, intron, downstream, and intergenic regions by principal component analysis and discriminant analysis of these features. The results show that we could classify the functional regions of genome based on sequence feature and discriminant analysis.
传统的序列分析依赖于序列比对。在本研究中,我们基于序列特征分析了人类基因组的各个功能区域,这些特征包括词频、二核苷酸相对丰度以及碱基-碱基相关性。我们分析了人类22号染色体,并通过对这些特征进行主成分分析和判别分析,对上游、外显子、内含子、下游和基因间区域进行了分类。结果表明,我们可以基于序列特征和判别分析对基因组的功能区域进行分类。