Hyung Daejin, Baek Min-Jeong, Lee Jongkeun, Cho Juyeon, Kim Hyoun Sook, Park Charny, Cho Soo Young
National Cancer Center, 323 Ilsan-ro, Goyang-si, Gyeonggi-do 10408, Republic of Korea.
Department of Molecular and Life Science, Hanyang University, Ansan 15588, Republic of Korea.
Comput Struct Biotechnol J. 2021 Aug 17;19:4759-4769. doi: 10.1016/j.csbj.2021.08.022. eCollection 2021.
Researchers have gained new therapeutic insights using multi-omics platform approaches to study DNA, RNA, and proteins of comprehensively characterized human cancer cell lines. To improve our understanding of the molecular features associated with oncogenic modulation in cancer, we proposed a proteogenomic database for human cancer cell lines, called Protein-gene Expression Nexus (PEN). We have expanded the characterization of cancer cell lines to include genetic, mRNA, and protein data of 145 cancer cell lines from various public studies. PEN contains proteomic and phosphoproteomic data on 4,129,728 peptides, 13,862 proteins, 7,138 phosphorylation site-associated genomic variations, 117 studies, and 12 cancer. We analyzed functional characterizations along with the integrated datasets, such as cis/trans association for copy number alteration (CNA), single amino acid variation for coding genes, post-translation modification site variation for Single Amino Acid Variation, and novel peptide expression for noncoding regions and fusion genes. PEN provides a user-friendly interface for searching, browsing, and downloading data and also supports the visualization of genome-wide association between CNA and expression, novel peptide landscape, mRNA-protein abundance, and functional annotation. Together, this dataset and PEN data portal provide a resource to accelerate cancer research using model cancer cell lines. PEN is freely accessible at http://combio.snu.ac.kr/pen.
研究人员利用多组学平台方法来研究全面表征的人类癌细胞系的DNA、RNA和蛋白质,从而获得了新的治疗见解。为了增进我们对癌症中与致癌调节相关的分子特征的理解,我们提出了一个用于人类癌细胞系的蛋白质基因组数据库,称为蛋白质-基因表达关联数据库(PEN)。我们扩展了癌细胞系的表征范围,纳入了来自各种公开研究的145个癌细胞系的基因、mRNA和蛋白质数据。PEN包含关于4,129,728个肽段、13,862种蛋白质、7,138个磷酸化位点相关的基因组变异、117项研究以及12种癌症的蛋白质组学和磷酸化蛋白质组学数据。我们分析了功能特征以及整合数据集,如拷贝数改变(CNA)的顺式/反式关联、编码基因的单氨基酸变异、单氨基酸变异的翻译后修饰位点变异以及非编码区和融合基因的新型肽段表达。PEN提供了一个用户友好的界面用于搜索、浏览和下载数据,还支持可视化CNA与表达之间的全基因组关联、新型肽段图谱、mRNA-蛋白质丰度以及功能注释。总之,这个数据集和PEN数据门户提供了一个资源,以加速使用模型癌细胞系进行癌症研究。可通过http://combio.snu.ac.kr/pen免费访问PEN。