Computer Engineering Department, Kadir Has University, Istanbul, Turkey.
PLoS One. 2019 Jan 28;14(1):e0210954. doi: 10.1371/journal.pone.0210954. eCollection 2019.
Understanding expression levels of proteins and their interactions is a key factor to diagnose and explain the Down syndrome which can be considered as the most prevalent reason of intellectual disability in human beings. In the previous studies, the expression levels of 77 proteins obtained from normal genotype control mice and from trisomic Ts65Dn mice have been analyzed after training in contextual fear conditioning with and without injection of the memantine drug using statistical methods and machine learning techniques. Recent studies have also pointed out that there may be a linkage between the Down syndrome and the immune system. Thus, the research presented in this paper aim at in silico identification of proteins which are significant to the learning process and the immune system and to derive the most accurate model for classification of mice. In this paper, the features are selected by implementing forward feature selection method after preprocessing step of the dataset. Later, deep neural network, gradient boosting tree, support vector machine and random forest classification methods are implemented to identify the accuracy. It is observed that the selected feature subsets not only yield higher accuracy classification results but also are composed of protein responses which are important for the learning and memory process and the immune system.
理解蛋白质的表达水平及其相互作用是诊断和解释唐氏综合征的关键因素,唐氏综合征是人类智力残疾最常见的原因之一。在以前的研究中,通过统计方法和机器学习技术,分析了在进行情境恐惧条件反射训练后,来自正常基因型对照小鼠和三体 Ts65Dn 小鼠的 77 种蛋白质的表达水平,这些训练分为注射和不注射美金刚药物两种情况。最近的研究还指出,唐氏综合征和免疫系统之间可能存在联系。因此,本文的研究旨在通过计算方法鉴定对学习过程和免疫系统有意义的蛋白质,并为小鼠分类得出最准确的模型。在本文中,通过对数据集的预处理步骤后,采用前向特征选择方法来选择特征。然后,实现了深度神经网络、梯度提升树、支持向量机和随机森林分类方法来识别准确性。结果表明,所选的特征子集不仅产生了更高的分类准确性,而且还包含了对学习和记忆过程以及免疫系统很重要的蛋白质反应。