两种模式生物中非编码序列与编码序列之间长度分布的评估。

Assessment of length distributions between non-coding and coding sequences amongst two model organisms.

作者信息

Caldwell Rachel, Lin Yan-Xia, Zhang Ren

机构信息

School of Biological Sciences, University of Wollongong, Northfields Ave, NSW 2522, Australia.

出版信息

Int J Data Min Bioinform. 2010;4(5):535-52. doi: 10.1504/ijdmb.2010.035899.

DOI:10.1504/ijdmb.2010.035899

PMID:21133040

Abstract

The availability of genomic DNA and cDNA sequence data has escalated the data mining and genomics era. We aim to investigate the length distributions of the non-coding and coding regions of protein genes of two model organisms, Arabidopsis thaliana and Drosophila melanogaster. A non-linear functional relationship model was applied and strong correlation was found between the Coding Sequence (CDS) and non-coding sequence regions, conditional on the 5' UTR data. Significant differences were found between the protein functional classes and each gene region. Examination of the non-coding and coding regions of these organisms has revealed possible correlations.

摘要

基因组DNA和cDNA序列数据的可得性推动了数据挖掘和基因组学时代的发展。我们旨在研究两种模式生物——拟南芥和黑腹果蝇蛋白质基因的非编码区和编码区的长度分布。应用了非线性函数关系模型，发现在5'非翻译区（UTR）数据的条件下，编码序列（CDS）与非编码序列区域之间存在强相关性。在蛋白质功能类别和每个基因区域之间发现了显著差异。对这些生物的非编码区和编码区的研究揭示了可能的相关性。