School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, 541004, China.
BMC Bioinformatics. 2023 Jun 22;24(1):261. doi: 10.1186/s12859-023-05378-x.
Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders characterized by difficulty communicating with society and others, behavioral difficulties, and a brain that processes information differently than normal. Genetics has a strong impact on ASD associated with early onset and distinctive signs. Currently, all known ASD risk genes are able to encode proteins, and some de novo mutations disrupting protein-coding genes have been demonstrated to cause ASD. Next-generation sequencing technology enables high-throughput identification of ASD risk RNAs. However, these efforts are time-consuming and expensive, so an efficient computational model for ASD risk gene prediction is necessary.
In this study, we propose DeepASDPerd, a predictor for ASD risk RNA based on deep learning. Firstly, we use K-mer to feature encode the RNA transcript sequences, and then fuse them with corresponding gene expression values to construct a feature matrix. After combining chi-square test and logistic regression to select the best feature subset, we input them into a binary classification prediction model constructed by convolutional neural network and long short-term memory for training and classification. The results of the tenfold cross-validation proved our method outperformed the state-of-the-art methods. Dataset and source code are available at https://github.com/Onebear-X/DeepASDPred is freely available.
Our experimental results show that DeepASDPred has outstanding performance in identifying ASD risk RNA genes.
自闭症谱系障碍(ASD)是一组神经发育障碍,其特征是与社会和他人交流困难、行为困难以及大脑处理信息的方式与正常情况不同。遗传因素对与早期发病和独特特征相关的 ASD 有很大影响。目前,所有已知的 ASD 风险基因都能够编码蛋白质,一些破坏蛋白编码基因的新突变已被证明会导致 ASD。下一代测序技术能够高通量鉴定 ASD 风险 RNA。然而,这些工作既耗时又昂贵,因此需要一个高效的计算模型来预测 ASD 风险基因。
在这项研究中,我们提出了 DeepASDPerd,这是一种基于深度学习的 ASD 风险 RNA 预测器。首先,我们使用 K-mer 对 RNA 转录本序列进行特征编码,然后将其与相应的基因表达值融合,构建特征矩阵。在结合卡方检验和逻辑回归选择最佳特征子集后,我们将其输入到由卷积神经网络和长短期记忆组成的二进制分类预测模型中进行训练和分类。十折交叉验证的结果证明了我们的方法优于最先进的方法。数据集和源代码可在 https://github.com/Onebear-X/DeepASDPred 上获得,该模型是免费提供的。
我们的实验结果表明,DeepASDPred 在识别 ASD 风险 RNA 基因方面具有出色的性能。