Liu Chenglin, Cui Peng, Huang Tao
School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, 800 Dongchuan Rd., Minhang, Shanghai 200240. China.
Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences; Shanghai 200031. China.
Comb Chem High Throughput Screen. 2017;20(7):603-611. doi: 10.2174/1386207320666170417144937.
The cell cycle-regulated genes express periodically with the cell cycle stages, and the identification and study of these genes can provide a deep understanding of the cell cycle process. Large false positives and low overlaps are big problems in cell cycle-regulated gene detection.
Here, a computational framework called DLGene was proposed for cell cycle-regulated gene detection. It is based on the convolutional neural network, a deep learning algorithm representing raw form of data pattern without assumption of their distribution. First, the expression data was transformed to categorical state data to denote the changing state of gene expression, and four different expression patterns were revealed for the reported cell cycle-regulated genes. Then, DLGene was applied to discriminate the non-cell cycle gene and the four subtypes of cell cycle genes. Its performances were compared with six traditional machine learning methods. At last, the biological functions of representative cell cycle genes for each subtype are analyzed.
Our method showed better and more balanced performance of sensitivity and specificity comparing to other machine learning algorithms. The cell cycle genes had very different expression pattern with non-cell cycle genes and among the cell-cycle genes, there were four subtypes. Our method not only detects the cell cycle genes, but also describes its expression pattern, such as when its highest expression level is reached and how it changes with time. For each type, we analyzed the biological functions of the representative genes and such results provided novel insight to the cell cycle mechanisms.
细胞周期调控基因随细胞周期阶段周期性表达,对这些基因的鉴定和研究有助于深入了解细胞周期进程。在细胞周期调控基因检测中,高假阳性率和低重叠率是两大难题。
本文提出了一种名为DLGene的计算框架用于细胞周期调控基因检测。它基于卷积神经网络,这是一种深度学习算法,能够在不假设数据分布的情况下表示数据模式的原始形式。首先,将表达数据转换为分类状态数据以表示基因表达的变化状态,并且揭示了已报道的细胞周期调控基因的四种不同表达模式。然后,使用DLGene区分非细胞周期基因和细胞周期基因的四种亚型。将其性能与六种传统机器学习方法进行比较。最后,分析了每种亚型代表性细胞周期基因的生物学功能。
与其他机器学习算法相比,我们的方法在敏感性和特异性方面表现出更好且更平衡的性能。细胞周期基因与非细胞周期基因具有非常不同的表达模式,并且在细胞周期基因中存在四种亚型。我们的方法不仅能够检测细胞周期基因,还能描述其表达模式,例如何时达到最高表达水平以及它如何随时间变化。对于每种类型,我们分析了代表性基因的生物学功能,这些结果为细胞周期机制提供了新的见解。