Wang Yan, Zhang Shuangquan, Yang Lili, Yang Sen, Tian Yuan, Ma Qin
Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
School of Artificial Intelligence, Jilin University, Changchun, China.
Front Genet. 2019 Oct 22;10:1009. doi: 10.3389/fgene.2019.01009. eCollection 2019.
Measuring conditional relatedness, the degree of relation between a pair of genes in a certain condition, is a basic but difficult task in bioinformatics, as traditional co-expression analysis methods rely on co-expression similarities, well known with high false positive rate. Complement with prior-knowledge similarities is a feasible way to tackle the problem. However, classical combination machine learning algorithms fail in detection and application of the complex mapping relations between similarities and conditional relatedness, so a powerful predictive model will have enormous benefit for measuring this kind of complex mapping relations. To this need, we propose a novel deep learning model of convolutional neural network with a fully connected first layer, named fully convolutional neural network (FCNN), to measure conditional relatedness between genes using both co-expression and prior-knowledge similarities. The results on validation and test datasets show FCNN model yields an average 3.0% and 2.7% higher accuracy values for identifying gene-gene interactions collected from the COXPRESdb, KEGG, and TRRUST databases, and a benchmark dataset of Xiao-Yong et al. research, by grid-search 10-fold cross validation, respectively. In order to estimate the FCNN model, we conduct a further verification on the GeneFriends and DIP datasets, and the FCNN model obtains an average of 1.8% and 7.6% higher accuracy, respectively. Then the FCNN model is applied to construct cancer gene networks, and also calls more practical results than other compared models and methods. A website of the FCNN model and relevant datasets can be accessed from https://bmbl.bmi.osumc.edu/FCNN.
测量条件相关性,即在特定条件下一对基因之间的关联程度,是生物信息学中一项基本但困难的任务,因为传统的共表达分析方法依赖于共表达相似性,而众所周知其假阳性率很高。用先验知识相似性作为补充是解决该问题的一种可行方法。然而,经典的组合机器学习算法在检测和应用相似性与条件相关性之间的复杂映射关系时存在不足,因此一个强大的预测模型对于测量这种复杂映射关系将有巨大的益处。针对这一需求,我们提出了一种新颖的深度学习模型,即具有全连接第一层的卷积神经网络,称为全卷积神经网络(FCNN),以利用共表达和先验知识相似性来测量基因之间的条件相关性。在验证和测试数据集上的结果表明,通过网格搜索10折交叉验证,FCNN模型在识别从COXPRESdb、KEGG和TRRUST数据库以及Xiao - Yong等人研究的基准数据集中收集的基因 - 基因相互作用时,准确率分别平均提高了3.0%和2.7%。为了评估FCNN模型,我们在GeneFriends和DIP数据集上进行了进一步验证,FCNN模型分别平均获得了1.8%和7.6%更高的准确率。然后将FCNN模型应用于构建癌症基因网络,并且与其他比较模型和方法相比还获得了更实际的结果。可以从https://bmbl.bmi.osumc.edu/FCNN访问FCNN模型和相关数据集的网站。