• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CaPSID:一个用于在人类基因组和转录组中进行计算病原体序列识别的生物信息学平台。

CaPSID: a bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes.

机构信息

Ontario Institute for Cancer Research, MaRS Centre, Toronto, Ontario, Canada.

出版信息

BMC Bioinformatics. 2012 Aug 17;13:206. doi: 10.1186/1471-2105-13-206.

DOI:10.1186/1471-2105-13-206
PMID:22901030
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3464663/
Abstract

BACKGROUND

It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools.

RESULTS

Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage.

CONCLUSIONS

To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID's predictions were successfully validated in vitro.

摘要

背景

现在已经明确,近 20%的人类癌症是由感染因子引起的,并且在未来各种癌症类型中,人类致癌病原体的清单将会增加。下一代测序技术对整个肿瘤转录组和基因组进行测序,为在人体组织中检测和发现病原体提供了前所未有的机会,但需要开发新的全基因组生物信息学工具。

结果

在这里,我们提出了 CaPSID(计算病原体序列识别),这是一个全面的生物信息学平台,用于识别、查询和可视化肿瘤基因组和转录组中外源和内源性病原体核苷酸序列。CaPSID 包括一个可扩展的、高性能的数据库用于数据存储,以及一个集成基因组浏览器 JBrowse 的 Web 应用程序。CaPSID 还为预对齐的 BAM 文件的序列分析提供了有用的指标,如基因和基因组覆盖率,并针对低内存使用的多处理器计算机进行了优化,以实现高效运行。

结论

为了展示 CaPSID 的有用性和效率,我们对模拟数据集和卵巢癌的转录组样本进行了全面分析。CaPSID 正确地识别了模拟数据集中的所有人类和病原体序列,而在卵巢数据集,CaPSID 的预测在体外得到了成功验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/fa58d9e7beaf/1471-2105-13-206-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/fd0f70db8ac2/1471-2105-13-206-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/1abe48262e36/1471-2105-13-206-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/c81ac629e933/1471-2105-13-206-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/38ca905bee1e/1471-2105-13-206-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/fb92dbfb6c2b/1471-2105-13-206-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/fa58d9e7beaf/1471-2105-13-206-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/fd0f70db8ac2/1471-2105-13-206-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/1abe48262e36/1471-2105-13-206-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/c81ac629e933/1471-2105-13-206-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/38ca905bee1e/1471-2105-13-206-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/fb92dbfb6c2b/1471-2105-13-206-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2958/3464663/fa58d9e7beaf/1471-2105-13-206-6.jpg

相似文献

1
CaPSID: a bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes.CaPSID:一个用于在人类基因组和转录组中进行计算病原体序列识别的生物信息学平台。
BMC Bioinformatics. 2012 Aug 17;13:206. doi: 10.1186/1471-2105-13-206.
2
drVM: a new tool for efficient genome assembly of known eukaryotic viruses from metagenomes.drVM:一种用于从宏基因组中高效组装已知真核病毒基因组的新工具。
Gigascience. 2017 Feb 1;6(2):1-10. doi: 10.1093/gigascience/gix003.
3
Oomycete Transcriptomics Database: a resource for oomycete transcriptomes.卵菌转录组数据库:卵菌转录组资源。
BMC Genomics. 2012 Jul 6;13:303. doi: 10.1186/1471-2164-13-303.
4
Visualizing next-generation sequencing data with JBrowse.使用 JBrowse 可视化下一代测序数据。
Brief Bioinform. 2013 Mar;14(2):172-7. doi: 10.1093/bib/bbr078. Epub 2012 Mar 12.
5
ENCODE whole-genome data in the UCSC Genome Browser.在 UCSC 基因组浏览器中对全基因组数据进行编码。
Nucleic Acids Res. 2010 Jan;38(Database issue):D620-5. doi: 10.1093/nar/gkp961. Epub 2009 Nov 17.
6
Blue: correcting sequencing errors using consensus and context.蓝色:使用一致性和上下文来纠正测序错误。
Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.
7
Novel software package for cross-platform transcriptome analysis (CPTRA).用于跨平台转录组分析的新型软件包 (CPTRA)。
BMC Bioinformatics. 2009 Oct 8;10 Suppl 11(Suppl 11):S16. doi: 10.1186/1471-2105-10-S11-S16.
8
RNASeqBrowser: a genome browser for simultaneous visualization of raw strand specific RNAseq reads and UCSC genome browser custom tracks.RNA序列浏览器:一种用于同时可视化原始链特异性RNA序列读数和加州大学圣克鲁兹分校(UCSC)基因组浏览器自定义轨迹的基因组浏览器。
BMC Genomics. 2015 Mar 1;16(1):145. doi: 10.1186/s12864-015-1346-2.
9
CANCERTOOL: A Visualization and Representation Interface to Exploit Cancer Datasets.CANCERTOOL:用于挖掘癌症数据集的可视化和表示接口。
Cancer Res. 2018 Nov 1;78(21):6320-6328. doi: 10.1158/0008-5472.CAN-18-1669. Epub 2018 Sep 19.
10
The UCSC genome browser and associated tools.UCSC 基因组浏览器及相关工具。
Brief Bioinform. 2013 Mar;14(2):144-61. doi: 10.1093/bib/bbs038. Epub 2012 Aug 20.

引用本文的文献

1
Could the tumor-associated microbiota be the new multi-faceted player in the tumor microenvironment?肿瘤相关微生物群会是肿瘤微环境中一个新的多面角色吗?
Front Oncol. 2023 May 23;13:1185163. doi: 10.3389/fonc.2023.1185163. eCollection 2023.
2
Diagnostic and prognostic role of TFF3, Romo-1, NF-кB and SFRP4 as biomarkers for endometrial and ovarian cancers: a prospective observational translational study.TFF3、Romo-1、NF-кB 和 SFRP4 作为子宫内膜癌和卵巢癌生物标志物的诊断和预后作用:一项前瞻性观察性转化研究。
Arch Gynecol Obstet. 2022 Dec;306(6):2105-2114. doi: 10.1007/s00404-022-06563-8. Epub 2022 Apr 24.
3

本文引用的文献

1
Rapid identification of non-human sequences in high-throughput sequencing datasets.高通量测序数据中非人类序列的快速鉴定。
Bioinformatics. 2012 Apr 15;28(8):1174-5. doi: 10.1093/bioinformatics/bts100. Epub 2012 Feb 28.
2
InterPro in 2011: new developments in the family and domain prediction database.InterPro 在 2011 年:家族和域预测数据库的新发展。
Nucleic Acids Res. 2012 Jan;40(Database issue):D306-12. doi: 10.1093/nar/gkr948. Epub 2011 Nov 16.
3
Pathogen detection using short-RNA deep sequencing subtraction and assembly.
The Landscape of Microbial Composition and Associated Factors in Pancreatic Ductal Adenocarcinoma Using RNA-Seq Data.
利用RNA测序数据剖析胰腺导管腺癌中微生物组成及相关因素的格局
Front Oncol. 2021 May 31;11:651350. doi: 10.3389/fonc.2021.651350. eCollection 2021.
4
Tissue-associated microbial detection in cancer using human sequencing data.利用人类测序数据检测癌症中的组织相关微生物。
BMC Bioinformatics. 2020 Dec 3;21(Suppl 9):523. doi: 10.1186/s12859-020-03831-9.
5
PVAmpliconFinder: a workflow for the identification of human papillomaviruses from high-throughput amplicon sequencing.PVAmpliconFinder:一种从高通量扩增子测序中鉴定人乳头瘤病毒的工作流程。
BMC Bioinformatics. 2020 Jun 8;21(1):233. doi: 10.1186/s12859-020-03573-8.
6
The landscape of bacterial presence in tumor and adjacent normal tissue across 9 major cancer types using TCGA exome sequencing.利用TCGA外显子组测序技术,对9种主要癌症类型的肿瘤及相邻正常组织中的细菌存在情况进行分析。
Comput Struct Biotechnol J. 2020 Mar 13;18:631-641. doi: 10.1016/j.csbj.2020.03.003. eCollection 2020.
7
The landscape of viral associations in human cancers.人类癌症中病毒相关性的全景。
Nat Genet. 2020 Mar;52(3):320-330. doi: 10.1038/s41588-019-0558-9. Epub 2020 Feb 5.
8
From trash to treasure: detecting unexpected contamination in unmapped NGS data.从垃圾到宝藏:检测未映射 NGS 数据中的意外污染。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):168. doi: 10.1186/s12859-019-2684-x.
9
A virome-wide clonal integration analysis platform for discovering cancer viral etiology.用于发现癌症病毒病因的病毒组全克隆整合分析平台。
Genome Res. 2019 May;29(5):819-830. doi: 10.1101/gr.242529.118. Epub 2019 Mar 14.
10
Overview of Virus Metagenomic Classification Methods and Their Biological Applications.病毒宏基因组分类方法及其生物学应用概述
Front Microbiol. 2018 Apr 23;9:749. doi: 10.3389/fmicb.2018.00749. eCollection 2018.
使用短 RNA 深度测序消减和组装进行病原体检测。
Bioinformatics. 2011 Aug 1;27(15):2027-30. doi: 10.1093/bioinformatics/btr349. Epub 2011 Jun 11.
4
Full-length transcriptome assembly from RNA-Seq data without a reference genome.无参考基因组的 RNA-Seq 数据的全长转录组组装。
Nat Biotechnol. 2011 May 15;29(7):644-52. doi: 10.1038/nbt.1883.
5
PathSeq: software to identify or discover microbes by deep sequencing of human tissue.PathSeq:通过对人体组织进行深度测序来识别或发现微生物的软件。
Nat Biotechnol. 2011 May;29(5):393-6. doi: 10.1038/nbt.1868.
6
Online verification of human cell line identity by STR DNA typing.通过STR DNA分型对人源细胞系身份进行在线验证。
Methods Mol Biol. 2011;731:45-55. doi: 10.1007/978-1-61779-080-5_5.
7
SHRiMP2: sensitive yet practical SHort Read Mapping.SHRiMP2:敏感而实用的短读序列比对。
Bioinformatics. 2011 Apr 1;27(7):1011-2. doi: 10.1093/bioinformatics/btr046. Epub 2011 Jan 28.
8
The E4orf6/E1B55K E3 ubiquitin ligase complexes of human adenoviruses exhibit heterogeneity in composition and substrate specificity.人腺病毒 E4orf6/E1B55K E3 泛素连接酶复合物在组成和底物特异性上表现出异质性。
J Virol. 2011 Jan;85(2):765-75. doi: 10.1128/JVI.01890-10. Epub 2010 Nov 10.
9
The UCSC Genome Browser database: update 2011.加州大学圣克鲁兹分校基因组浏览器数据库:2011年更新
Nucleic Acids Res. 2011 Jan;39(Database issue):D876-82. doi: 10.1093/nar/gkq963. Epub 2010 Oct 18.
10
International network of cancer genome projects.国际癌症基因组计划网络。
Nature. 2010 Apr 15;464(7291):993-8. doi: 10.1038/nature08987.