School of Information and Computer Engineering , Northeast Forestry University , Harbin 150040 , China.
Institute of Fundamental and Frontier Sciences , University of Electronic Science and Technology of China , Chengdu 610054 , China.
J Proteome Res. 2019 Mar 1;18(3):1392-1401. doi: 10.1021/acs.jproteome.9b00012. Epub 2019 Feb 18.
The major histocompatibility complex (MHC) is a term for all gene groups of a major histocompatibility antigen. It binds to peptide chains derived from pathogens and displays pathogens on the cell surface to facilitate T-cell recognition and perform a series of immune functions. MHC molecules are critical in transplantation, autoimmunity, infection, and tumor immunotherapy. Combining machine learning algorithms and making full use of bioinformatics analysis technology, more accurate recognition of MHC is an important task. The paper proposed a new MHC recognition method compared with traditional biological methods and used the built classifier to classify and identify MHC I and MHC II. The classifier used the SVMProt 188D, bag-of-ngrams (BonG), and information theory (IT) mixed feature representation methods and used the extreme learning machine (ELM), which selects lin-kernel as the activation function and used 10-fold cross-validation and the independent test set validation to verify the accuracy of the constructed classifier and simultaneously identify the MHC and identify the MHC I and MHC II, respectively. Through the 10-fold cross-validation, the proposed algorithm obtained 91.66% accuracy when identifying MHC and 94.442% accuracy when identifying MHC categories. Furthermore, an online identification Web site named ELM-MHC was constructed with the following URL: http://server.malab.cn/ELM-MHC/ .
主要组织相容性复合体(MHC)是主要组织相容性抗原的所有基因群的术语。它与来自病原体的肽链结合,并将病原体呈现在细胞表面,以促进 T 细胞识别并执行一系列免疫功能。MHC 分子在移植、自身免疫、感染和肿瘤免疫治疗中至关重要。结合机器学习算法并充分利用生物信息学分析技术,更准确地识别 MHC 是一项重要任务。与传统的生物学方法相比,本文提出了一种新的 MHC 识别方法,并使用构建的分类器对 MHC I 和 MHC II 进行分类和识别。该分类器使用 SVMProt 188D、袋式 ng 词 (BonG) 和信息理论 (IT) 混合特征表示方法,并使用极端学习机 (ELM),选择线性核作为激活函数,并使用 10 倍交叉验证和独立测试集验证来验证构建的分类器的准确性,并同时识别 MHC 并分别识别 MHC I 和 MHC II。通过 10 倍交叉验证,所提出的算法在识别 MHC 时获得了 91.66%的准确率,在识别 MHC 类别时获得了 94.442%的准确率。此外,构建了一个名为 ELM-MHC 的在线识别网站,网址为:http://server.malab.cn/ELM-MHC/ 。