Khalil Ahmed Ibrahim Samir, Chattopadhyay Anupam, Sanyal Amartya
School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.
School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
Cancer Inform. 2021 Oct 16;20:11769351211049236. doi: 10.1177/11769351211049236. eCollection 2021.
The revolution in next-generation sequencing (NGS) technology has allowed easy access and sharing of high-throughput sequencing datasets of cancer cell lines and their integrative analyses. However, long-term passaging and culture conditions introduce high levels of genomic and phenotypic diversity in established cell lines resulting in strain differences. Thus, clonal variation in cultured cell lines with respect to the reference standard is a major barrier in systems biology data analyses. Therefore, there is a pressing need for a fast and entry-level assessment of clonal variations within cell lines using their high-throughput sequencing data.
We developed a Python-based software, AStra, for estimation of the genome-wide segmental aneuploidy to measure and visually interpret strain-level similarities or differences of cancer cell lines from whole-genome sequencing (WGS). We demonstrated that aneuploidy spectrum can capture the genetic variations in 27 strains of MCF7 breast cancer cell line collected from different laboratories. Performance evaluation of AStra using several cancer sequencing datasets revealed that cancer cell lines exhibit distinct aneuploidy spectra which reflect their previously-reported karyotypic observations. Similarly, AStra successfully identified large-scale DNA copy number variations (CNVs) artificially introduced in simulated WGS datasets.
AStra provides an analytical and visualization platform for rapid and easy comparison between different strains or between cell lines based on their aneuploidy spectra solely using the raw BAM files representing mapped reads. We recommend AStra for rapid first-pass quality assessment of cancer cell lines before integrating scientific datasets that employ deep sequencing. AStra is an open-source software and is available at https://github.com/AISKhalil/AStra.
下一代测序(NGS)技术的革命使得癌细胞系的高通量测序数据集易于获取和共享,并能进行综合分析。然而,长期传代和培养条件会在已建立的细胞系中引入高水平的基因组和表型多样性,从而导致菌株差异。因此,培养的细胞系相对于参考标准的克隆变异是系统生物学数据分析中的一个主要障碍。因此,迫切需要利用癌细胞系的高通量测序数据对其克隆变异进行快速且入门级的评估。
我们开发了一款基于Python的软件AStra,用于估计全基因组片段非整倍体,以测量并直观解读来自全基因组测序(WGS)的癌细胞系在菌株水平上的相似性或差异。我们证明,非整倍体谱能够捕捉从不同实验室收集的27株MCF7乳腺癌细胞系中的遗传变异。使用多个癌症测序数据集对AStra进行性能评估表明,癌细胞系呈现出独特的非整倍体谱,这反映了它们先前报道的核型观察结果。同样,AStra成功识别了模拟WGS数据集中人工引入的大规模DNA拷贝数变异(CNV)。
AStra提供了一个分析和可视化平台,仅使用代表比对 reads 的原始BAM文件,就能基于非整倍体谱在不同菌株之间或细胞系之间进行快速且简便的比较。我们建议在整合采用深度测序的科学数据集之前,使用AStra对癌细胞系进行快速的首次质量评估。AStra是一款开源软件,可在https://github.com/AISKhalil/AStra获取。