Zhang Jinghui, Finney Richard P, Rowe William, Edmonson Michael, Yang Sei Hoon, Dracheva Tatiana, Jen Jin, Struewing Jeffery P, Buetow Kenneth H
Laboratory of Population Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
Genome Res. 2007 Jul;17(7):1111-7. doi: 10.1101/gr.5963407. Epub 2007 May 24.
Systematic investigations of genetic changes in tumors are expected to lead to greatly improved understanding of cancer etiology. To meet the analytical challenges presented by such studies, we developed the Cancer Genome WorkBench (http://cgwb.nci.nih.gov), the first computational platform to integrate clinical tumor mutation profiles with the reference human genome. A novel heuristic algorithm, IndelDetector, was developed to automatically identify insertion/deletion (indel) polymorphisms as well as indel somatic mutations with high sensitivity and accuracy. It was incorporated into an automated pipeline that detects genetic alterations and annotates their effects on protein coding and 3D structure. The ability of the system to facilitate identifying genetic alterations is illustrated in three projects with publicly accessible data. Mutagenesis in tumor DNA replication leading to complex genetic changes in the EGFR kinase domain is suggested by a novel deletion-insertion combination observed in paired tumor-normal lung cancer resequencing data. Automated analysis of 152 genes resequenced by the SeattleSNPs group was able to identify 91% of the 1251 indel polymorphisms discovered by SeattleSNPs. In addition, our system discovered 518 novel indels in this data set, 451 of which were found to be valid by manual inspection of sequence traces. Our experience demonstrates that CGWB not only greatly improves the productivity and the accuracy of mutation identification, but also, through its data integration and visualization capabilities, facilitates identification of underlying genetic etiology.
对肿瘤基因变化进行系统研究有望极大地增进我们对癌症病因的理解。为应对此类研究带来的分析挑战,我们开发了癌症基因组工作台(http://cgwb.nci.nih.gov),这是首个将临床肿瘤突变图谱与人类参考基因组整合的计算平台。我们还开发了一种新颖的启发式算法IndelDetector,用于自动识别插入/缺失(indel)多态性以及具有高灵敏度和准确性的indel体细胞突变。它被纳入一个自动流程,该流程可检测基因改变并注释其对蛋白质编码和三维结构的影响。该系统在三个具有公开可用数据的项目中展示了其促进识别基因改变的能力。在配对的肿瘤-正常肺癌重测序数据中观察到的一种新型缺失-插入组合表明,肿瘤DNA复制中的诱变作用会导致表皮生长因子受体(EGFR)激酶结构域发生复杂的基因变化。对由SeattleSNPs小组重测序的152个基因进行自动分析,能够识别出SeattleSNPs发现的1251个indel多态性中的91%。此外,我们的系统在该数据集中发现了518个新的indel,通过对序列痕迹的人工检查发现其中451个是有效的。我们的经验表明,癌症基因组工作台不仅极大地提高了突变识别的效率和准确性,而且通过其数据整合和可视化功能,有助于识别潜在的遗传病因。