Yue Zhenyu, Xiang Ying, Chen Guojun, Wang Xiaosong, Li Ke, Zhang Youhua
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3226-3233. doi: 10.1109/TCBB.2023.3266232. Epub 2023 Oct 9.
Inframe insertion/deletion (indel) variants may alter protein sequence and function, which are closely related to an extensive variety of diseases. Although recent researches have paid attention to the associations between inframe indels and diseases, modeling indels in silico and interpreting their pathogenicity remain challenging, mainly due to the lack of experimental information and computational methodologies. In this article, we propose a novel computational method named PredinID (Predictor for inframe InDels) via graph convolutional network (GCN). PredinID leverages k-nearest neighbor algorithm to construct the feature graph for aggregating more informative representation, regarding the pathogenic inframe indel prediction as a node classification task. An edge-based sampling strategy is designed for extracting information from both the potential connections of feature space and the topological structure of subgraphs. Evaluated by 5-fold cross-validations, the PredinID method achieves satisfactory performance and is superior to four classic machine learning algorithms and two GCN methods. Comprehensive experiments show that PredinID has superior performances when compared with the state-of-the-art methods on the independent test set. Moreover, we also implement a web server at http://predinid.bio.aielab.cc/, to facilitate the use of the model.
框内插入/缺失(indel)变异可能会改变蛋白质序列和功能,这与多种疾病密切相关。尽管最近的研究已经关注到框内indel与疾病之间的关联,但在计算机上对indel进行建模并解释其致病性仍然具有挑战性,主要是由于缺乏实验信息和计算方法。在本文中,我们通过图卷积网络(GCN)提出了一种名为PredinID(框内InDels预测器)的新型计算方法。PredinID利用k近邻算法构建特征图,以聚合更多信息丰富的表示,将致病性框内indel预测视为节点分类任务。设计了一种基于边的采样策略,用于从特征空间的潜在连接和子图的拓扑结构中提取信息。通过五折交叉验证评估,PredinID方法取得了令人满意的性能,优于四种经典机器学习算法和两种GCN方法。综合实验表明,在独立测试集上,与现有最先进方法相比,PredinID具有更优的性能。此外,我们还在http://predinid.bio.aielab.cc/上实现了一个网络服务器,以方便模型的使用。