Shandong Jianzhu University, Jinan, 250101, PR China.
Shandong Jianzhu University, Jinan, 250101, PR China.
Comput Biol Med. 2024 Mar;170:107980. doi: 10.1016/j.compbiomed.2024.107980. Epub 2024 Jan 13.
Missense mutations affect the function of human proteins and are closely associated with multiple acute and chronic diseases. The identification of disease-associated missense mutations and their classification for pathogenicity can provide insights into the genetic basis of disease and protein function. This paper proposes MLAE (Method based on LSTM-Ladder AutoEncoder), a deep learning classification model for identifying disease-associated missense mutations and classifying their pathogenicity based on the Variational AutoEncoder (VAE) framework. MLAE overcomes the limitations of the VAE framework by introducing the Ladder structure, combined with LSTM networks. This reduces the loss of original information during the transmission process, thereby making the model more effective in learning. In the experiment, MLAE classified all 27572 possible missense variants of the three input proteins with an average classification AUC of 0.941. This result provides evidence that MLAE is effective in predicting pathogenicity. Additionally, MLAE provides results for multi-label classification, with an average Hamming loss of 0.196, supporting the classification of complex variants. The proposed MLAE method provides an insightful approach to effectively capture amino acid sequence information and accurately predict the pathogenicity of mutations, thereby providing an analytical basis for the study and prevention of related diseases.
错义突变会影响人类蛋白质的功能,与多种急性和慢性疾病密切相关。识别与疾病相关的错义突变,并对其致病性进行分类,可以深入了解疾病的遗传基础和蛋白质功能。本文提出了 MLAE(基于 LSTM- Ladder AutoEncoder 的方法),这是一种基于变分自编码器(VAE)框架的深度学习分类模型,用于识别与疾病相关的错义突变,并对其致病性进行分类。MLAE 通过引入 Ladder 结构,结合 LSTM 网络,克服了 VAE 框架的局限性。这减少了在传输过程中原始信息的丢失,从而使模型在学习方面更加有效。在实验中,MLAE 对三种输入蛋白质的所有 27572 种可能的错义变体进行了分类,平均分类 AUC 为 0.941。这一结果为 MLAE 有效预测致病性提供了证据。此外,MLAE 提供了多标签分类的结果,平均汉明损失为 0.196,支持复杂变体的分类。所提出的 MLAE 方法为有效捕捉氨基酸序列信息和准确预测突变的致病性提供了一种有见地的方法,从而为相关疾病的研究和预防提供了分析基础。