Cancer Vaccine Center, Dana-Farber Cancer Institute, Boston, MA 02115, USA.
J Immunol Methods. 2011 Nov 30;374(1-2):18-25. doi: 10.1016/j.jim.2011.07.007. Epub 2011 Jul 18.
The immune system is characterized by high combinatorial complexity that necessitates the use of specialized computational tools for analysis of immunological data. Machine learning (ML) algorithms are used in combination with classical experimentation for the selection of vaccine targets and in computational simulations that reduce the number of necessary experiments. The development of ML algorithms requires standardized data sets, consistent measurement methods, and uniform scales. To bridge the gap between the immunology community and the ML community, we designed a repository for machine learning in immunology named Dana-Farber Repository for Machine Learning in Immunology (DFRMLI). This repository provides standardized data sets of HLA-binding peptides with all binding affinities mapped onto a common scale. It also provides a list of experimentally validated naturally processed T cell epitopes derived from tumor or virus antigens. The DFRMLI data were preprocessed and ensure consistency, comparability, detailed descriptions, and statistically meaningful sample sizes for peptides that bind to various HLA molecules. The repository is accessible at http://bio.dfci.harvard.edu/DFRMLI/.
免疫系统的特点是组合复杂度高,因此需要使用专门的计算工具来分析免疫学数据。机器学习 (ML) 算法与经典实验相结合,用于选择疫苗靶点,并在计算模拟中减少所需实验的数量。ML 算法的开发需要标准化的数据集、一致的测量方法和统一的尺度。为了弥合免疫学社区和 ML 社区之间的差距,我们设计了一个名为 Dana-Farber 机器学习免疫学知识库 (DFRMLI) 的机器学习知识库。该知识库提供了 HLA 结合肽的标准化数据集,所有结合亲和力都映射到一个共同的尺度上。它还提供了一组经过实验验证的源自肿瘤或病毒抗原的天然加工 T 细胞表位列表。DFRMLI 数据经过预处理,可确保与各种 HLA 分子结合的肽具有一致性、可比性、详细描述和具有统计学意义的样本量。该知识库可在 http://bio.dfci.harvard.edu/DFRMLI/ 访问。