Oleksiewicz Urszula, Tomczak Katarzyna, Woropaj Jakub, Markowska Monika, Stępniak Piotr, Shah Parantu K
Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; These authors contributed equally to this paper.
Laboratory of Gene Therapy, Department of Cancer Immunology, The Greater Poland Cancer Centre, Poznan, Poland ; Department of Cancer Immunology and Diagnostics, Chair of Medical Biotechnology, Poznan University of Medical Sciences, Poznan, Poland ; Postgraduate School of Molecular Medicine, Medical University of Warsaw, Warsaw ; These authors contributed equally to this paper.
Contemp Oncol (Pozn). 2015;19(1A):A78-91. doi: 10.5114/wo.2014.47137.
Our current understanding of cancer genetics is grounded on the principle that cancer arises from a clone that has accumulated the requisite somatically acquired genetic aberrations, leading to the malignant transformation. It also results in aberrent of gene and protein expression. Next generation sequencing (NGS) or deep sequencing platforms are being used to create large catalogues of changes in copy numbers, mutations, structural variations, gene fusions, gene expression, and other types of information for cancer patients. However, inferring different types of biological changes from raw reads generated using the sequencing experiments is algorithmically and computationally challenging. In this article, we outline common steps for the quality control and processing of NGS data. We highlight the importance of accurate and application-specific alignment of these reads and the methodological steps and challenges in obtaining different types of information. We comment on the importance of integrating these data and building infrastructure to analyse it. We also provide exhaustive lists of available software to obtain information and point the readers to articles comparing software for deeper insight in specialised areas. We hope that the article will guide readers in choosing the right tools for analysing oncogenomic datasets.
癌症起源于一个积累了必要的体细胞获得性基因畸变的克隆,从而导致恶性转化。它还会导致基因和蛋白质表达异常。下一代测序(NGS)或深度测序平台正被用于为癌症患者创建大量关于拷贝数变化、突变、结构变异、基因融合、基因表达及其他类型信息的目录。然而,从测序实验产生的原始读数中推断不同类型的生物学变化在算法和计算方面都具有挑战性。在本文中,我们概述了NGS数据质量控制和处理的常见步骤。我们强调了这些读数进行准确且特定于应用的比对的重要性,以及获取不同类型信息的方法步骤和挑战。我们评论了整合这些数据并构建分析基础设施的重要性。我们还提供了获取信息的可用软件的详尽列表,并引导读者阅读比较软件的文章,以便在专业领域有更深入的了解。我们希望本文能指导读者选择正确的工具来分析肿瘤基因组数据集。