Gerasimov Ekaterina, Zelikovsky Alex, Măndoiu Ion, Ionov Yurij
Department of Computer Science, Georgia State University, 25 Park Place, Atlanta, 30303, GA, USA.
Department of Computer Science and Engineering, University of Connecticut, Storrs, 06269, CT, USA.
BMC Bioinformatics. 2017 Jun 7;18(Suppl 8):244. doi: 10.1186/s12859-017-1661-5.
For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities.
Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences.
We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.
对于抗癌而言,早期检测至关重要。患者自身免疫系统在接触癌症蛋白后产生的循环自身抗体是癌症早期检测很有前景的生物标志物。由于抗体识别的不是整个抗原,而是抗原决定簇(表位)内的4至7个关键氨基酸,因此整个蛋白质组可以由随机肽噬菌体展示文库来代表。这为基于通过比较癌症患者和健康供体的抗体特异性的全局肽谱所鉴定的一组肽序列开发早期癌症检测测试开辟了可能性。
由于下一代测序产生的全局肽谱中包含大量肽序列,因此需要大量癌症血清和对照血清来鉴定具有高度统计学意义的癌症特异性肽。为了在不丢失癌症特异性序列的情况下减少下一代测序产生的肽谱中的肽数量,我们使用在癌症血清池中淘选富集的噬菌体文库来生成肽谱。为了进一步降低肽谱的复杂性,我们使用计算方法将构成模拟表位谱的肽列表转换为由相似肽序列形成的基序列表。
我们已经表明,模拟表位基序中的氨基酸顺序是有意义的,因为它们包含的肽比氨基酸随机排列的肽中的基序显著更多。此外,单一样本基序与从多个样本中提取的肽中的基序也有显著差异。最后,已经鉴定出多个癌症特异性基序。