• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从序列读取存档中的250,000次人类测序运行中提取等位基因读数计数。

Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive.

作者信息

Tsui Brian, Dow Michelle, Skola Dylan, Carter Hannah

机构信息

Department of Medicine, University of California San Diego, 9500 Gilman, San Diego, California 92093, USA.

出版信息

Pac Symp Biocomput. 2019;24:196-207.

PMID:30864322
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6415672/
Abstract

The Sequence Read Archive (SRA) contains over one million publicly available sequencing runs from various studies using a variety of sequencing library strategies. These data inherently contain information about underlying genomic sequence variants which we exploit to extract allelic read counts on an unprecedented scale. We reprocessed over 250,000 human sequencing runs (>1000 TB data worth of raw sequence data) into a single unified dataset of allelic read counts for nearly 300,000 variants of biomedical relevance curated by NCBI dbSNP, where germline variants were detected in a median of 912 sequencing runs, and somatic variants were detected in a median of 4,876 sequencing runs, suggesting that this dataset facilitates identification of sequencing runs that harbor variants of interest. Allelic read counts obtained using a targeted alignment were very similar to read counts obtained from whole-genome alignment. Analyzing allelic read count data for matched DNA and RNA samples from tumors, we find that RNA-seq can also recover variants identified by Whole Exome Sequencing (WXS), suggesting that reprocessed allelic read counts can support variant detection across different library strategies in SRA. This study provides a rich database of known human variants across SRA samples that can support future meta-analyses of human sequence variation.

摘要

序列读取存档(SRA)包含来自各种研究的超过100万个公开可用的测序运行数据,这些研究采用了多种测序文库策略。这些数据本身包含有关潜在基因组序列变异的信息,我们利用这些信息以前所未有的规模提取等位基因读数计数。我们将超过250,000次人类测序运行(超过1000 TB的原始序列数据)重新处理为一个单一的统一数据集,该数据集包含由NCBI dbSNP策划的近300,000个具有生物医学相关性的变体的等位基因读数计数,其中种系变体在中位数为912次测序运行中被检测到,体细胞变体在中位数为4,876次测序运行中被检测到,这表明该数据集有助于识别包含感兴趣变体的测序运行。使用靶向比对获得的等位基因读数计数与从全基因组比对获得的读数计数非常相似。分析来自肿瘤的匹配DNA和RNA样本的等位基因读数计数数据,我们发现RNA测序也可以恢复通过全外显子测序(WXS)鉴定的变体,这表明重新处理的等位基因读数计数可以支持SRA中不同文库策略的变体检测。这项研究提供了一个丰富的数据库,包含SRA样本中已知的人类变体,可以支持未来对人类序列变异的荟萃分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/475d5b62f030/nihms-999793-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/effa8ba3dc2d/nihms-999793-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/895958dba6a6/nihms-999793-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/a6337c196e28/nihms-999793-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/7a49b4cf0903/nihms-999793-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/cc9340c92070/nihms-999793-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/3445fb47c360/nihms-999793-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/be3200f925c0/nihms-999793-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/475d5b62f030/nihms-999793-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/effa8ba3dc2d/nihms-999793-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/895958dba6a6/nihms-999793-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/a6337c196e28/nihms-999793-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/7a49b4cf0903/nihms-999793-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/cc9340c92070/nihms-999793-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/3445fb47c360/nihms-999793-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/be3200f925c0/nihms-999793-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/831a/6415672/475d5b62f030/nihms-999793-f0008.jpg

相似文献

1
Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive.从序列读取存档中的250,000次人类测序运行中提取等位基因读数计数。
Pac Symp Biocomput. 2019;24:196-207.
2
pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive.pysradb:一个用于查询来自NCBI序列读取存档库的下一代测序元数据和数据的Python包。
F1000Res. 2019 Apr 23;8:532. doi: 10.12688/f1000research.18676.1. eCollection 2019.
3
A computational approach to distinguish somatic vs. germline origin of genomic alterations from deep sequencing of cancer specimens without a matched normal.一种计算方法,用于从无匹配正常样本的癌症标本深度测序中区分基因组改变的体细胞起源与种系起源。
PLoS Comput Biol. 2018 Feb 7;14(2):e1005965. doi: 10.1371/journal.pcbi.1005965. eCollection 2018 Feb.
4
A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.一种用于估计DNA和RNA测序实验中PCR重复率的计算方法。
BMC Bioinformatics. 2017 Mar 14;18(Suppl 3):43. doi: 10.1186/s12859-017-1471-9.
5
The Sequence Read Archive: explosive growth of sequencing data.序列读取档案:测序数据的爆炸式增长。
Nucleic Acids Res. 2012 Jan;40(Database issue):D54-6. doi: 10.1093/nar/gkr854. Epub 2011 Oct 18.
6
MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.MetaSRA:序列读取档案中标准化的人类样本特定元数据。
Bioinformatics. 2017 Sep 15;33(18):2914-2923. doi: 10.1093/bioinformatics/btx334.
7
Using GenBank and SRA.使用 GenBank 和 SRA。
Methods Mol Biol. 2022;2443:1-25. doi: 10.1007/978-1-0716-2067-0_1.
8
Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples.使用高通量测序对混合 DNA 样本进行罕见和低频变异的研究。
Sci Rep. 2016 Sep 16;6:33256. doi: 10.1038/srep33256.
9
BlackOPs: increasing confidence in variant detection through mappability filtering.BlackOPs:通过可映射性过滤提高变异检测的置信度。
Nucleic Acids Res. 2013 Oct;41(19):e178. doi: 10.1093/nar/gkt692. Epub 2013 Aug 8.
10
Systematic evaluation of signal-to-noise ratio in variant detection from single cell genome multiple displacement amplification and exome sequencing.系统评估单细胞基因组多重置换扩增和外显子组测序中变异检测的信噪比。
BMC Genomics. 2018 Sep 17;19(1):681. doi: 10.1186/s12864-018-5063-5.

引用本文的文献

1
Non-cancer-related pathogenic germline variants and expression consequences in ten-thousand cancer genomes.十万癌症基因组中的非癌症相关种系致病性变异体及其表达后果。
Genome Med. 2021 Sep 9;13(1):147. doi: 10.1186/s13073-021-00964-1.
2
Contiguous erosion of the inactive X in human pluripotency concludes with global DNA hypomethylation.人类多能性中失活 X 染色体的连续侵蚀以全局 DNA 低甲基化告终。
Cell Rep. 2021 Jun 8;35(10):109215. doi: 10.1016/j.celrep.2021.109215.
3
Upregulated expression of SAC3D1 is associated with progression in gastric cancer.

本文引用的文献

1
Massive mining of publicly available RNA-seq data from human and mouse.大规模挖掘人类和小鼠公共可用的 RNA-seq 数据。
Nat Commun. 2018 Apr 10;9(1):1366. doi: 10.1038/s41467-018-03751-6.
2
A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data.用于下一代测序数据的体细胞单核苷酸变异检测算法综述。
Comput Struct Biotechnol J. 2018 Feb 6;16:15-24. doi: 10.1016/j.csbj.2018.01.003. eCollection 2018.
3
Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia.
SAC3D1 的过表达与胃癌的进展有关。
Int J Oncol. 2020 Jul;57(1):122-138. doi: 10.3892/ijo.2020.5048. Epub 2020 Apr 15.
4
Improving the diagnostic yield of exome- sequencing by predicting gene-phenotype associations using large-scale gene expression analysis.利用大规模基因表达分析预测基因-表型关联,提高外显子组测序的诊断产量。
Nat Commun. 2019 Jun 28;10(1):2837. doi: 10.1038/s41467-019-10649-4.
5
GREIN: An Interactive Web Platform for Re-analyzing GEO RNA-seq Data.GREIN:一个用于重新分析 GEO RNA-seq 数据的交互式网络平台。
Sci Rep. 2019 May 20;9(1):7580. doi: 10.1038/s41598-019-43935-8.
单细胞转录组学揭示慢性髓性白血病干细胞的独特分子特征。
Nat Med. 2017 Jun;23(6):692-702. doi: 10.1038/nm.4336. Epub 2017 May 15.
4
Reproducible RNA-seq analysis using recount2.使用recount2进行可重复的RNA测序分析。
Nat Biotechnol. 2017 Apr 11;35(4):319-321. doi: 10.1038/nbt.3838.
5
Evaluating the necessity of PCR duplicate removal from next-generation sequencing data and a comparison of approaches.评估从下一代测序数据中去除PCR重复的必要性及方法比较。
BMC Bioinformatics. 2016 Jul 25;17 Suppl 7(Suppl 7):239. doi: 10.1186/s12859-016-1097-3.
6
The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary.2016 年世界卫生组织中枢神经系统肿瘤分类:概述。
Acta Neuropathol. 2016 Jun;131(6):803-20. doi: 10.1007/s00401-016-1545-1. Epub 2016 May 9.
7
Digital Quantification of Proteins and mRNA in Single Mammalian Cells.单细胞内蛋白质和 mRNA 的数字量化。
Mol Cell. 2016 Mar 17;61(6):914-24. doi: 10.1016/j.molcel.2016.02.030.
8
The FAIR Guiding Principles for scientific data management and stewardship.科学数据管理和保存的 FAIR 指导原则。
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.
9
Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity.识别癌症中的复发性突变揭示了广泛的谱系多样性和突变特异性。
Nat Biotechnol. 2016 Feb;34(2):155-63. doi: 10.1038/nbt.3391. Epub 2015 Nov 30.
10
The Genotype-Tissue Expression (GTEx) Project.基因型-组织表达(GTEx)项目
Biopreserv Biobank. 2015 Oct;13(5):307-8. doi: 10.1089/bio.2015.29031.hmm.