Suppr超能文献

蛋白质突变指标:用于溶质载体家族6突变致病性预测的机器学习方法

ProteoMutaMetrics: machine learning approaches for solute carrier family 6 mutation pathogenicity prediction.

作者信息

Huang Jiahui, Osthushenrich Tanja, MacNamara Aidan, Mälarstig Anders, Brocchetti Silvia, Bradberry Samuel, Scarabottolo Lia, Ferrada Evandro, Sosnin Sergey, Digles Daniela, Superti-Furga Giulio, Ecker Gerhard F

机构信息

University of Vienna, Department of Pharmaceutical Sciences Vienna Austria

Bayer AG, Division Pharmaceuticals, Biomedical Data Science II Wuppertal Germany.

出版信息

RSC Adv. 2024 Apr 22;14(19):13083-13094. doi: 10.1039/d4ra00748d.

Abstract

The solute carrier transporter family 6 (SLC6) is of key interest for their critical role in the transport of small amino acids or amino acid-like molecules. Their dysfunction is strongly associated with human diseases such as including schizophrenia, depression, and Parkinson's disease. Linking single point mutations to disease may support insights into the structure-function relationship of these transporters. This work aimed to develop a computational model for predicting the potential pathogenic effect of single point mutations in the SLC6 family. Missense mutation data was retrieved from UniProt, LitVar, and ClinVar, covering multiple protein-coding transcripts. As encoding approach, amino acid descriptors were used to calculate the average sequence properties for both original and mutated sequences. In addition to the full-sequence calculation, the sequences were cut into twelve domains. The domains are defined according to the transmembrane domains of the SLC6 transporters to analyse the regions' contributions to the pathogenicity prediction. Subsequently, several classification models, namely Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) with the hyperparameters optimized through grid search were built. For estimation of model performance, repeated stratified k-fold cross-validation was used. The accuracy values of the generated models are in the range of 0.72 to 0.80. Analysis of feature importance indicates that mutations in distinct regions of SLC6 transporters are associated with an increased risk for pathogenicity. When applying the model on an independent validation set, the performance in accuracy dropped to averagely 0.6 with high precision but low sensitivity scores.

摘要

溶质载体转运蛋白家族6(SLC6)因其在小氨基酸或类氨基酸分子转运中的关键作用而备受关注。它们的功能障碍与精神分裂症、抑郁症和帕金森病等人类疾病密切相关。将单点突变与疾病联系起来可能有助于深入了解这些转运蛋白的结构-功能关系。这项工作旨在开发一种计算模型,用于预测SLC6家族单点突变的潜在致病效应。错义突变数据从UniProt、LitVar和ClinVar中检索,涵盖多个蛋白质编码转录本。作为编码方法,使用氨基酸描述符来计算原始序列和突变序列的平均序列特性。除了全序列计算外,还将序列切割成十二个结构域。这些结构域根据SLC6转运蛋白的跨膜结构域定义,以分析各区域对致病性预测的贡献。随后,构建了几种分类模型,即支持向量机(SVM)、逻辑回归(LR)、随机森林(RF)和通过网格搜索优化超参数的极端梯度提升(XGBoost)。为了评估模型性能,使用了重复分层k折交叉验证。生成模型的准确率值在0.72至0.80范围内。特征重要性分析表明,SLC6转运蛋白不同区域的突变与致病性风险增加有关。当将该模型应用于独立验证集时,准确率性能平均降至0.6,精度高但灵敏度得分低。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0d3/11034476/334f3b12790e/d4ra00748d-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验