使用两步特征选择技术鉴定酿酒酵母中的复制原点。

Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique.

机构信息

Key Laboratory for NeuroInformation of Ministry of Education, School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.

Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, China.

出版信息

Bioinformatics. 2019 Jun 1;35(12):2075-2083. doi: 10.1093/bioinformatics/bty943.

DOI:10.1093/bioinformatics/bty943

PMID:30428009

Abstract

MOTIVATION

DNA replication is a key step to maintain the continuity of genetic information between parental generation and offspring. The initiation site of DNA replication, also called origin of replication (ORI), plays an extremely important role in the basic biochemical process. Thus, rapidly and effectively identifying the location of ORI in genome will provide key clues for genome analysis. Although biochemical experiments could provide detailed information for ORI, it requires high experimental cost and long experimental period. As good complements to experimental techniques, computational methods could overcome these disadvantages.

RESULTS

Thus, in this study, we developed a predictor called iORI-PseKNC2.0 to identify ORIs in the Saccharomyces cerevisiae genome based on sequence information. The PseKNC including 90 physicochemical properties was proposed to formulate ORI and non-ORI samples. In order to improve the accuracy, a two-step feature selection was proposed to exclude redundant and noise information. As a result, the overall success rate of 88.53% was achieved in the 5-fold cross-validation test by using support vector machine.

AVAILABILITY AND IMPLEMENTATION

Based on the proposed model, a user-friendly webserver was established and can be freely accessed at http://lin-group.cn/server/iORI-PseKNC2.0. The webserver will provide more convenience to most of wet-experimental scholars.

摘要

动机

DNA 复制是维持亲代和后代遗传信息连续性的关键步骤。DNA 复制的起始位点，也称为复制起点（ORI），在基本生化过程中起着极其重要的作用。因此，快速有效地确定基因组中 ORI 的位置将为基因组分析提供关键线索。虽然生化实验可以为 ORI 提供详细信息，但它需要高昂的实验成本和较长的实验周期。作为实验技术的良好补充，计算方法可以克服这些缺点。

结果

因此，在本研究中，我们开发了一种名为 iORI-PseKNC2.0 的预测器，用于基于序列信息识别酿酒酵母基因组中的 ORI。提出了包含 90 种物理化学性质的 PseKNC 来构建 ORI 和非 ORI 样本。为了提高准确性，我们提出了两步特征选择来排除冗余和噪声信息。结果，在 5 折交叉验证测试中，使用支持向量机获得了 88.53%的总体成功率。