Suppr超能文献

使用一致性留出采样器预测有害单氨基酸多态性

Prediction of Deleterious Single Amino Acid Polymorphisms with a Consensus Holdout Sampler.

作者信息

Álvarez-Machancoses Óscar, Faraggi Eshel, deAndrés-Galiana Enrique J, Fernández-Martínez Juan L, Kloczkowski Andrzej

机构信息

Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007, Oviedo, Spain.

School of Science, Indiana University-Purdue University Indianapolis, IN, USA.

出版信息

Curr Genomics. 2024 May 31;25(3):171-184. doi: 10.2174/0113892029236347240308054538. Epub 2024 Mar 14.

Abstract

BACKGROUND

Single Amino Acid Polymorphisms (SAPs) or nonsynonymous Single Nucleotide Variants (nsSNVs) are the most common genetic variations. They result from missense mutations where a single base pair substitution changes the genetic code in such a way that the triplet of bases (codon) at a given position is coding a different amino acid. Since genetic mutations sometimes cause genetic diseases, it is important to comprehend and foresee which variations are harmful and which ones are neutral (not causing changes in the phenotype). This can be posed as a classification problem.

METHODS

Computational methods using machine intelligence are gradually replacing repetitive and exceedingly overpriced mutagenic tests. By and large, uneven quality, deficiencies, and irregularities of nsSNVs datasets debase the convenience of artificial intelligence-based methods. Subsequently, strong and more exact approaches are needed to address these problems. In the present work paper, we show a consensus classifier built on the holdout sampler, which appears strong and precise and outflanks all other popular methods.

RESULTS

We produced 100 holdouts to test the structures and diverse classification variables of diverse classifiers during the training phase. The finest performing holdouts were chosen to develop a consensus classifier and tested using a k-fold (1 ≤ k ≤5) cross-validation method. We also examined which protein properties have the biggest impact on the precise prediction of the effects of nsSNVs.

CONCLUSION

Our Consensus Holdout Sampler outflanks other popular algorithms, and gives excellent results, highly accurate with low standard deviation. The advantage of our method emerges from using a tree of holdouts, where diverse LM/AI-based programs are sampled in diverse ways.

摘要

背景

单氨基酸多态性(SAPs)或非同义单核苷酸变异(nsSNVs)是最常见的基因变异。它们由错义突变产生,即单个碱基对的替换改变了遗传密码,使得给定位置的三联体碱基(密码子)编码不同的氨基酸。由于基因突变有时会导致遗传疾病,理解并预测哪些变异是有害的,哪些是中性的(不会导致表型变化)非常重要。这可以被视为一个分类问题。

方法

使用机器智能的计算方法正在逐渐取代重复性且成本过高的诱变测试。总体而言,nsSNVs数据集质量参差不齐、存在缺陷和不规则性,降低了基于人工智能方法的便利性。因此,需要更强大、更精确的方法来解决这些问题。在本工作论文中,我们展示了一种基于留出采样器构建的共识分类器,它看起来强大且精确,优于所有其他流行方法。

结果

在训练阶段,我们生成了100个留出样本,以测试不同分类器的结构和各种分类变量。选择表现最佳的留出样本构建一个共识分类器,并使用k折(1≤k≤5)交叉验证方法进行测试。我们还研究了哪些蛋白质特性对nsSNVs效应的精确预测影响最大。

结论

我们的共识留出采样器优于其他流行算法,并给出了出色的结果,具有高精度和低标准差。我们方法的优势源于使用留出样本树,其中基于不同的语言模型/人工智能程序以不同方式进行采样。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5de/11288160/5f296a45c3c6/CG-25-171_F1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验