Khandelwal Garima, Girotti María Romina, Smowton Christopher, Taylor Sam, Wirth Christopher, Dynowski Marek, Frese Kristopher K, Brady Ged, Dive Caroline, Marais Richard, Miller Crispin
RNA Biology Group, Cancer Research UK Manchester Institute, The University of Manchester, Manchester, United Kingdom.
Molecular Oncology Group, Cancer Research UK Manchester Institute, The University of Manchester, Manchester, United Kingdom.
Mol Cancer Res. 2017 Aug;15(8):1012-1016. doi: 10.1158/1541-7786.MCR-16-0431. Epub 2017 Apr 25.
Patient-derived xenograft (PDX) and circulating tumor cell-derived explant (CDX) models are powerful methods for the study of human disease. In cancer research, these methods have been applied to multiple questions, including the study of metastatic progression, genetic evolution, and therapeutic drug responses. As PDX and CDX models can recapitulate the highly heterogeneous characteristics of a patient tumor, as well as their response to chemotherapy, there is considerable interest in combining them with next-generation sequencing to monitor the genomic, transcriptional, and epigenetic changes that accompany oncogenesis. When used for this purpose, their reliability is highly dependent on being able to accurately distinguish between sequencing reads that originate from the host, and those that arise from the xenograft itself. Here, we demonstrate that failure to correctly identify contaminating host reads when analyzing DNA- and RNA-sequencing (DNA-Seq and RNA-Seq) data from PDX and CDX models is a major confounding factor that can lead to incorrect mutation calls and a failure to identify canonical mutation signatures associated with tumorigenicity. In addition, a highly sensitive algorithm and open source software tool for identifying and removing contaminating host sequences is described. Importantly, when applied to PDX and CDX models of melanoma, these data demonstrate its utility as a sensitive and selective tool for the correction of PDX- and CDX-derived whole-exome and RNA-Seq data. This study describes a sensitive method to identify contaminating host reads in xenograft and explant DNA- and RNA-Seq data and is applicable to other forms of deep sequencing. .
患者来源的异种移植(PDX)模型和循环肿瘤细胞来源的外植体(CDX)模型是研究人类疾病的有力方法。在癌症研究中,这些方法已被应用于多个问题,包括转移进展、基因进化和治疗药物反应的研究。由于PDX和CDX模型可以概括患者肿瘤的高度异质性特征及其对化疗的反应,因此人们对将它们与下一代测序相结合以监测肿瘤发生过程中伴随的基因组、转录组和表观遗传变化有着浓厚的兴趣。当用于此目的时,它们的可靠性高度依赖于能够准确区分源自宿主的测序读数和源自异种移植本身的测序读数。在这里,我们证明,在分析来自PDX和CDX模型的DNA测序(DNA-Seq)和RNA测序(RNA-Seq)数据时,未能正确识别污染的宿主读数是一个主要的混杂因素,可能导致错误的突变调用以及无法识别与肿瘤发生相关的典型突变特征。此外,还描述了一种用于识别和去除污染宿主序列的高灵敏度算法和开源软件工具。重要的是,当应用于黑色素瘤的PDX和CDX模型时,这些数据证明了它作为一种灵敏且有选择性的工具用于校正源自PDX和CDX的全外显子组和RNA-Seq数据的效用。本研究描述了一种灵敏的方法来识别异种移植和外植体DNA-Seq和RNA-Seq数据中污染的宿主读数,并且适用于其他形式的深度测序。