Suppr超能文献

比勒陀利亚:一种用于准确且高通量鉴定真核病原体CD8 T细胞表位的有效计算方法。

Pretoria: An effective computational approach for accurate and high-throughput identification of CD8 t-cell epitopes of eukaryotic pathogens.

作者信息

Charoenkwan Phasit, Schaduangrat Nalini, Pham Nhat Truong, Manavalan Balachandran, Shoombuatong Watshara

机构信息

Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.

Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.

出版信息

Int J Biol Macromol. 2023 May 31;238:124228. doi: 10.1016/j.ijbiomac.2023.124228. Epub 2023 Mar 29.

Abstract

T-cells recognize antigenic epitopes present on major histocompatibility complex (MHC) molecules, triggering an adaptive immune response in the host. T-cell epitope (TCE) identification is challenging because of the extensive number of undetermined proteins found in eukaryotic pathogens, as well as MHC polymorphisms. In addition, conventional experimental approaches for TCE identification are time-consuming and expensive. Thus, computational approaches that can accurately and rapidly identify CD8 T-cell epitopes (TCEs) of eukaryotic pathogens based solely on sequence information may facilitate the discovery of novel CD8 TCEs in a cost-effective manner. Here, Pretoria (Predictor of CD8 TCEs of eukaryotic pathogens) is proposed as the first stack-based approach for accurate and large-scale identification of CD8 TCEs of eukaryotic pathogens. In particular, Pretoria enabled the extraction and exploration of crucial information embedded in CD8 TCEs by employing a comprehensive set of 12 well-known feature descriptors extracted from multiple groups, including physicochemical properties, composition-transition-distribution, pseudo-amino acid composition, and amino acid composition. These feature descriptors were then utilized to construct a pool of 144 different machine learning (ML)-based classifiers based on 12 popular ML algorithms. Finally, the feature selection method was used to effectively determine the important ML classifiers for the construction of our stacked model. The experimental results indicated that Pretoria is an accurate and effective computational approach for CD8 TCE prediction; it was superior to several conventional ML classifiers and the existing method in terms of the independent test, with an accuracy of 0.866, MCC of 0.732, and AUC of 0.921. Additionally, to maximize user convenience for high-throughput identification of CD8 TCEs of eukaryotic pathogens, a user-friendly web server of Pretoria (http://pmlabstack.pythonanywhere.com/Pretoria) was developed and made freely available.

摘要

T细胞识别主要组织相容性复合体(MHC)分子上呈现的抗原表位,从而触发宿主的适应性免疫反应。由于真核病原体中存在大量未确定的蛋白质以及MHC多态性,T细胞表位(TCE)的鉴定具有挑战性。此外,用于TCE鉴定的传统实验方法既耗时又昂贵。因此,仅基于序列信息就能准确快速地鉴定真核病原体CD8 T细胞表位(TCE)的计算方法,可能会以经济高效的方式促进新型CD8 TCE的发现。在此,提出了比勒陀利亚方法(真核病原体CD8 TCE预测器),作为第一种基于堆叠的方法,用于准确大规模鉴定真核病原体的CD8 TCE。特别是,比勒陀利亚方法通过采用从多个组中提取的12组全面的著名特征描述符,包括物理化学性质、组成-转换-分布、伪氨基酸组成和氨基酸组成,实现了对CD8 TCE中嵌入的关键信息的提取和探索。然后,利用这些特征描述符基于12种流行的机器学习(ML)算法构建了144个不同的基于ML的分类器库。最后,使用特征选择方法有效地确定用于构建我们的堆叠模型的重要ML分类器。实验结果表明,比勒陀利亚方法是一种用于CD8 TCE预测的准确有效的计算方法;在独立测试方面,它优于几种传统的ML分类器和现有方法,准确率为0.866,马修斯相关系数为0.732,曲线下面积为0.921。此外,为了最大程度地方便用户高通量鉴定真核病原体的CD8 TCE,开发了一个用户友好的比勒陀利亚网络服务器(http://pmlabstack.pythonanywhere.com/Pretoria)并免费提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验