Department of Computer Science, University of Salerno, Fisciano, Italy.
BMC Bioinformatics. 2022 Dec 1;23(1):517. doi: 10.1186/s12859-022-05070-6.
This research aims to increase our knowledge of amyloidoses. These disorders cause incorrect protein folding, affecting protein functionality (on structure). Fibrillar deposits are the basis of some wellknown diseases, such as Alzheimer, Creutzfeldt-Jakob diseases and type II diabetes. For many of these amyloid proteins, the relative precursors are known. Discovering new protein precursors involved in forming amyloid fibril deposits would improve understanding the pathological processes of amyloidoses.
A new classifier, called ENTAIL, was developed using over than 4000 molecular descriptors. ENTAIL was based on the Naive Bayes Classifier with Unbounded Support and Gaussian Kernel Type, with an accuracy on the test set of 81.80%, SN of 100%, SP of 63.63% and an MCC of 0.683 on a balanced dataset.
The analysis carried out has demonstrated how, despite the various configurations of the tests, performances are superior in terms of performance on a balanced dataset.
本研究旨在增进我们对淀粉样变的认识。这些疾病导致蛋白质折叠错误,影响蛋白质功能(结构)。纤维状沉积物是一些知名疾病的基础,如阿尔茨海默病、克雅氏病和 2 型糖尿病。对于许多淀粉样蛋白,相对的前体是已知的。发现新的参与形成淀粉样纤维沉积物的蛋白质前体将有助于理解淀粉样变的病理过程。
使用超过 4000 个分子描述符开发了一种名为“ENTAIL”的新分类器。ENTAIL 基于无界支持和高斯核类型的朴素贝叶斯分类器,在测试集上的准确率为 81.80%,SN 为 100%,SP 为 63.63%,在平衡数据集上的 MCC 为 0.683。
尽管进行了各种测试配置,但分析表明,在平衡数据集上的性能方面,性能仍然更优。