Suppr超能文献

在对公共癌症基因组序列数据的二次数据分析中,区分潜在的细菌-肿瘤关联与污染。

Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data.

机构信息

Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA.

Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA.

出版信息

Microbiome. 2017 Jan 25;5(1):9. doi: 10.1186/s40168-016-0224-8.

Abstract

BACKGROUND

A variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome.

RESULTS

Through careful consideration of all of the bacterial taxa present in the cancer types investigated, their relative abundance, and batch effects, we were able to identify some read pairs from certain taxa as likely resulting from contamination. In particular, the presence of Mycobacterium tuberculosis complex in the ovarian serous cystadenocarcinoma (OV) and glioblastoma multiforme (GBM) samples was correlated with the sequencing center of the samples. Additionally, there was a correlation between the presence of Ralstonia spp. and two specific plates of acute myeloid leukemia (AML) samples. At the end, associations remained between Pseudomonas-like and Acinetobacter-like read pairs in AML, and Pseudomonas-like read pairs in stomach adenocarcinoma (STAD) that could not be explained through batch effects or systematic contamination as seen in other samples.

CONCLUSIONS

This approach suggests that it is possible to identify bacteria that may be present in human tumor samples from public genome sequencing data that can be examined further experimentally. More weight should be given to this approach in the future when bacterial associations with diseases are suspected.

摘要

背景

已知多种细菌会影响致癌作用。因此,我们试图研究是否可以利用大型公共癌症基因组研究(如癌症基因组图谱(TCGA))生成的公开全基因组和全转录组测序数据来识别与癌症相关的细菌。我们使用 Burrows-Wheeler 比对器(BWA)将 TCGA 的 Illumina 配对末端测序数据的子集与人类参考基因组和 RefSeq 数据库中的所有完整细菌基因组进行比对,以从微生物组中识别细菌读对。

结果

通过仔细考虑在所研究的癌症类型中存在的所有细菌分类群、它们的相对丰度和批次效应,我们能够识别出某些来自某些分类群的读对可能是由于污染造成的。特别是,卵巢浆液性囊腺癌(OV)和多形性胶质母细胞瘤(GBM)样本中结核分枝杆菌复合体的存在与样本的测序中心相关。此外,Ralstonia spp.的存在与急性髓系白血病(AML)样本的两个特定板之间存在相关性。最后,在 AML 中仍然存在与假单胞菌样和不动杆菌样读对相关的关联,以及在胃腺癌(STAD)中与假单胞菌样读对相关的关联,这些关联不能通过批次效应或在其他样本中看到的系统性污染来解释。

结论

这种方法表明,从公共基因组测序数据中识别可能存在于人类肿瘤样本中的细菌是可能的,这些细菌可以进一步进行实验研究。在未来怀疑细菌与疾病有关时,应更多地考虑这种方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/602a/5264480/a2654f74b021/40168_2016_224_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验