Mazzocchetti G, Poletti A, Solli V, Borsi E, Martello M, Vigliotta I, Armuzzi S, Taurisano B, Zamagni E, Cavo M, Terragna C
IRCCS Azienda Ospedaliero-Universitaria di Bologna, Istituto di Ematologia "Seràgnoli", Bologna, Italy.
Department of Specialized, Diagnostic and Experimental Medicine, University of Bologna, Italy.
Comput Struct Biotechnol J. 2022 Jul 3;20:3718-3728. doi: 10.1016/j.csbj.2022.06.062. eCollection 2022.
Human cancer arises from a population of cells that have acquired a wide range of genetic alterations, most of which are targets of therapeutic treatments or are used as prognostic factors for patient's risk stratification. Among these, copy number alterations (CNAs) are quite frequent. Currently, several molecular biology technologies, such as microarrays, NGS and single-cell approaches are used to define the genomic profile of tumor samples. Output data need to be analyzed with bioinformatic approaches and particularly by employing computational algorithms. Molecular biology tools estimate the baseline region by comparing either the mean probe signals, or the number of reads to the reference genome. However, when tumors display complex karyotypes, this type of approach could fail the baseline region estimation and consequently cause errors in the CNAs call. To overcome this issue, we designed an R-package, , able to check and, eventually, to adjust the baseline region, according to both the tumor-specific alterations' context and the sample-specific clustered genomic lesions. Several databases have been chosen to set up and validate the designed package, thus demonstrating the potential of to adjust copy number (CN) data from different tumors and analysis techniques. Relevantly, the analysis highlighted that up to 25% of samples need a baseline region adjustment and a redefinition of CNAs calls, thus causing a change in the prognostic risk classification of the patients. We support the implementation of within CN analysis bioinformatics pipelines to ensure a correct patient's stratification in risk categories, regardless of the tumor type.
人类癌症源于一群已获得广泛基因改变的细胞,其中大多数是治疗靶点或用作患者风险分层的预后因素。其中,拷贝数改变(CNA)相当常见。目前,几种分子生物学技术,如微阵列、NGS和单细胞方法,被用于定义肿瘤样本的基因组图谱。输出数据需要用生物信息学方法进行分析,特别是通过使用计算算法。分子生物学工具通过将平均探针信号或读数数量与参考基因组进行比较来估计基线区域。然而,当肿瘤显示复杂的核型时,这种方法可能无法准确估计基线区域,从而导致CNA调用错误。为克服这一问题,我们设计了一个R包,能够根据肿瘤特异性改变的背景和样本特异性聚类基因组病变来检查并最终调整基线区域。我们选择了几个数据库来建立和验证设计的包,从而证明了该包在调整来自不同肿瘤和分析技术的拷贝数(CN)数据方面的潜力。相关地,分析强调高达25%的样本需要调整基线区域并重新定义CNA调用,从而导致患者预后风险分类的改变。我们支持在CN分析生物信息学管道中实施该包,以确保无论肿瘤类型如何,都能正确地将患者分层到风险类别中。