Guffanti Alessandro, Iacono Michele, Pelucchi Paride, Kim Namshin, Soldà Giulia, Croft Larry J, Taft Ryan J, Rizzi Ermanno, Askarian-Amiri Marjan, Bonnal Raoul J, Callari Maurizio, Mignone Flavio, Pesole Graziano, Bertalot Giovanni, Bernardi Luigi Rossi, Albertini Alberto, Lee Christopher, Mattick John S, Zucchi Ileana, De Bellis Gianluca
Institute of Biomedical Technologies, National Research Council, Milan, Italy.
BMC Genomics. 2009 Apr 20;10:163. doi: 10.1186/1471-2164-10-163.
The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts.
We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas.
Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling.
由于与疾病状态相关的基因表达在数量和质量上的变化具有异质性,癌症转录组难以探究。越来越多的“非常规”转录本,如新型异构体、非编码RNA、体细胞基因融合和缺失,已与肿瘤状态相关联。大规模平行测序技术为以有限的实验室和资金投入探索癌症固有的转录复杂性提供了一个框架。我们开发了一种深度测序和生物信息学分析方案,以研究乳腺癌聚腺苷酸加尾(poly(A)+)转录组的分子组成。该方法利用cDNA文库标准化步骤来减少高表达转录本的占比,并通过面向生物学的生物信息学分析来促进对稀有和新型转录本的检测。
我们分析了来自一名原发性人小叶乳腺癌组织标本的超过132,000条罗氏454高可信度深度测序读数,并检测到一系列异常转录事件,随后在另外8份原发性人乳腺癌样本中通过逆转录聚合酶链反应(RT-PCR)对其进行了验证。我们鉴定并验证了一个缺失、两个新型非编码RNA(一个基因间的和一个基因内的)、十个先前未知或罕见的转录异构体以及一个特定于单个原发性组织样本的新型基因融合。我们还探索了乳腺癌转录组的非蛋白质编码部分,鉴定出数千种新型非编码转录本以及与非编码RNA MALAT1相对应的三百多条读数,MALAT1在许多人类癌症中高表达。
我们的结果表明,将454深度测序与标准化步骤及仔细的生物信息学分析相结合,有助于发现和定量稀有转录本或非编码RNA,并可作为一种定性工具来表征转录组复杂性,即使在相对较低的序列采样情况下,也能揭示许多迄今未知的转录本、剪接异构体、基因融合事件和非编码RNA。