Janairo Jose Isagani B, Sy-Janairo Marianne Linley L
Biology Department, De La Salle University, 2401 Taft Avenue, Manila, Philippines.
Institute of Digestive and Liver Diseases, St. Luke's Medical Center - Global City, Taguig, Philippines.
Data Brief. 2020 Feb 29;29:105351. doi: 10.1016/j.dib.2020.105351. eCollection 2020 Apr.
The article presents a dataset containing nine classes of calculated sequence-derived descriptors for 78 peptide sequences, 21 of which demonstrate the ability to bind with gastric cancer cells. The datasaet was used in the paper "A screening algorithm for gastric cancer binding peptides" [1] for the creation of a classification model that can predict the ability of a given peptide sequence to bind with gastric cancer cells. The 78 peptide sequences were extracted from a systematic literature search, and the various peptide descriptors were calculated using the R package "Peptides". The nine calculated sequence-derived descriptor classes are the Blosum indices, Cruciani properties, FASGAI vectors, Kidera factors, ProtFP, ST-scales, T-scales, VHSE scales, and Z-scales. The resulting dataset, which is composed of over 4000 data points, offers a rich resource for further protochemometric analyses of the curated peptide sequences relevant to cancer diagnostics and therapeutics.
本文介绍了一个数据集,其中包含针对78个肽序列计算得出的九类序列衍生描述符,其中21个肽序列表现出与胃癌细胞结合的能力。该数据集在论文《一种胃癌结合肽的筛选算法》[1]中用于创建一个分类模型,该模型可以预测给定肽序列与胃癌细胞结合的能力。这78个肽序列是通过系统的文献检索提取的,各种肽描述符是使用R包“Peptides”计算得出的。计算得出的九类序列衍生描述符类别为布洛萨姆指数、克鲁恰尼性质、FASGAI向量、基德因子、ProtFP、ST量表、T量表、VHSE量表和Z量表。由此产生的数据集由超过4000个数据点组成,为进一步对与癌症诊断和治疗相关的精选肽序列进行原化学计量分析提供了丰富的资源。