Polychronopoulos Dimitris, Weitschek Emanuel, Dimitrieva Slavica, Bucher Philipp, Felici Giovanni, Almirantis Yannis
Institute of Biosciences and Applications, National Center for Scientific Research "Demokritos", 15310 Athens, Greece; Department of Biochemistry and Molecular Biology, Faculty of Biology, National and Kapodistrian University of Athens, 15701 Athens, Greece.
Department of Engineering, Roma Tre University, Via della Vasca Navale 79, 00146 Rome, Italy; Institute of Systems Analysis and Computer Science "Antonio Ruberti", National Research Council, Viale Manzoni 30, 00185 Rome, Italy.
Genomics. 2014 Aug;104(2):79-86. doi: 10.1016/j.ygeno.2014.07.004. Epub 2014 Jul 22.
Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented.
通过比较两个或更多基因组而鉴定出的、存在于所有后生动物基因组中的保守非编码元件(CNE)的组成分析工作较少。在此,我们采用一种方法对CNE进行分析,该方法以特征向量表示的形式考虑了不同长度尺度下的单词出现情况以及基于规则的分类器。我们将我们的方法应用于蛋白质编码外显子和CNE,这些外显子和CNE来源于人类、昆虫(黑腹果蝇)和蠕虫(秀丽隐杆线虫)基因组,它们要么是在本研究中鉴定出来的,要么是从文献中获取的。序列的无比对特征向量表示与基于规则的分类方法相结合,成功地对不同的CNE类别进行了分类。通过与基因组特征方法进行比较得出了具有生物学意义的结果,并给出了基因组各种功能元件以及替代物的分类率。