Suppr超能文献

ApoPred:具有多种特征的载脂蛋白及其亚家族的鉴定

ApoPred: Identification of Apolipoproteins and Their Subfamilies With Multifarious Features.

作者信息

Liu Ting, Chen Jia-Mao, Zhang Dan, Zhang Qian, Peng Bowen, Xu Lei, Tang Hua

机构信息

School of Basic Medical Sciences, Southwest Medical University, Luzhou, China.

Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

Front Cell Dev Biol. 2021 Jan 8;8:621144. doi: 10.3389/fcell.2020.621144. eCollection 2020.

Abstract

Apolipoprotein is a group of plasma proteins that are associated with a variety of diseases, such as hyperlipidemia, atherosclerosis, Alzheimer's disease, and diabetes. In order to investigate the function of apolipoproteins and to develop effective targets for related diseases, it is necessary to accurately identify and classify apolipoproteins. Although it is possible to identify apolipoproteins accurately through biochemical experiments, they are expensive and time-consuming. This work aims to establish a high-efficiency and high-accuracy prediction model for recognition of apolipoproteins and their subfamilies. We firstly constructed a high-quality benchmark dataset including 270 apolipoproteins and 535 non-apolipoproteins. Based on the dataset, pseudo-amino acid composition (PseAAC) and composition of k-spaced amino acid pairs (CKSAAP) were used as input vectors. To improve the prediction accuracy and eliminate redundant information, analysis of variance (ANOVA) was used to rank the features. And the incremental feature selection was utilized to obtain the best feature subset. Support vector machine (SVM) was proposed to construct the classification model, which could produce the accuracy of 97.27%, sensitivity of 96.30%, and specificity of 97.76% for discriminating apolipoprotein from non-apolipoprotein in 10-fold cross-validation. In addition, the same process was repeated to generate a new model for predicting apolipoprotein subfamilies. The new model could achieve an overall accuracy of 95.93% in 10-fold cross-validation. According to our proposed model, a convenient webserver called ApoPred was established, which can be freely accessed at http://tang-biolab.com/server/ApoPred/service.html. We expect that this work will contribute to apolipoprotein function research and drug development in relevant diseases.

摘要

载脂蛋白是一类血浆蛋白,与多种疾病相关,如高脂血症、动脉粥样硬化、阿尔茨海默病和糖尿病。为了研究载脂蛋白的功能并开发针对相关疾病的有效靶点,准确识别和分类载脂蛋白是必要的。虽然通过生化实验可以准确识别载脂蛋白,但这些实验成本高且耗时。这项工作旨在建立一个高效、准确的载脂蛋白及其亚家族识别预测模型。我们首先构建了一个高质量的基准数据集,包括270种载脂蛋白和535种非载脂蛋白。基于该数据集,将伪氨基酸组成(PseAAC)和k间隔氨基酸对组成(CKSAAP)用作输入向量。为了提高预测准确性并消除冗余信息,使用方差分析(ANOVA)对特征进行排序。并利用增量特征选择获得最佳特征子集。提出支持向量机(SVM)构建分类模型,在10折交叉验证中,该模型区分载脂蛋白和非载脂蛋白的准确率为97.27%,灵敏度为96.30%,特异性为97.76%。此外,重复相同的过程以生成预测载脂蛋白亚家族的新模型。新模型在10折交叉验证中可实现95.93%的总体准确率。根据我们提出的模型,建立了一个名为ApoPred的便捷网络服务器,可通过http://tang-biolab.com/server/ApoPred/service.html免费访问。我们期望这项工作将有助于载脂蛋白功能研究和相关疾病的药物开发。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8aa3/7820372/c54f4b7b3a59/fcell-08-621144-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验