• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大型功能基因组学数据实现超快且可扩展的变体注释和优先级排序。

Ultrafast and scalable variant annotation and prioritization with big functional genomics data.

机构信息

The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China.

Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.

出版信息

Genome Res. 2020 Dec;30(12):1789-1801. doi: 10.1101/gr.267997.120. Epub 2020 Oct 15.

DOI:10.1101/gr.267997.120
PMID:33060171
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7706736/
Abstract

The advances of large-scale genomics studies have enabled compilation of cell type-specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.

摘要

大规模基因组学研究的进展使得能够以高分辨率编译细胞类型特异性的全基因组 DNA 功能元件。随着功能注释数据和测序变体数量的不断增加,现有的变体注释算法在处理大型基因组数据时效率和可扩展性不足,特别是在针对具有数十亿个基因组特征的庞大数据库注释全基因组测序变体时。在这里,我们开发了 VarNote 来快速注释大型复杂功能注释资源中的基因组规模变体。VarNote 配备了新颖的索引系统和并行随机扫描搜索算法,在不同规模上相对于现有算法具有显著的性能提升(两个到三个数量级)。它支持基于区域和等位基因特异性的注释,并引入了用于灵活提取注释的高级功能。通过在 VarNote 框架中集成大量基于碱基和上下文相关的注释,我们引入了三种高效准确的管道,用于为常见疾病、孟德尔疾病和癌症优先排序因果调控变体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/5411abd2ae55/1789f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/c574d74e282c/1789f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/3d97a9efc391/1789f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/ed8f359d41e4/1789f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/9364b88ef092/1789f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/5411abd2ae55/1789f05.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/c574d74e282c/1789f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/3d97a9efc391/1789f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/ed8f359d41e4/1789f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/9364b88ef092/1789f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c42/7706736/5411abd2ae55/1789f05.jpg

相似文献

1
Ultrafast and scalable variant annotation and prioritization with big functional genomics data.利用大型功能基因组学数据实现超快且可扩展的变体注释和优先级排序。
Genome Res. 2020 Dec;30(12):1789-1801. doi: 10.1101/gr.267997.120. Epub 2020 Oct 15.
2
VannoPortal: multiscale functional annotation of human genetic variants for interrogating molecular mechanism of traits and diseases.VannoPortal:人类遗传变异的多尺度功能注释,用于探究性状和疾病的分子机制。
Nucleic Acids Res. 2022 Jan 7;50(D1):D1408-D1416. doi: 10.1093/nar/gkab853.
3
In Silico Functional Annotation of Genomic Variation.基因组变异的计算机功能注释
Curr Protoc Hum Genet. 2016 Jan 1;88:6.15.1-6.15.17. doi: 10.1002/0471142905.hg0615s88.
4
FAVOR: functional annotation of variants online resource and annotator for variation across the human genome.FAVOR:在线变体功能注释资源和人类基因组变异注释器。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1300-D1311. doi: 10.1093/nar/gkac966.
5
The VAAST Variant Prioritizer (VVP): ultrafast, easy to use whole genome variant prioritization tool.VAAST 变异优先级工具(VVP):超快、易用的全基因组变异优先级工具。
BMC Bioinformatics. 2018 Feb 20;19(1):57. doi: 10.1186/s12859-018-2056-y.
6
Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR.使用ANNOVAR和wANNOVAR进行基因组变异注释和优先级排序。
Nat Protoc. 2015 Oct;10(10):1556-66. doi: 10.1038/nprot.2015.105. Epub 2015 Sep 17.
7
Gene and Variant Annotation for Mendelian Disorders in the Era of Advanced Sequencing Technologies.先进测序技术时代孟德尔疾病的基因和变异体注释。
Annu Rev Genomics Hum Genet. 2017 Aug 31;18:229-256. doi: 10.1146/annurev-genom-083115-022545. Epub 2017 Apr 17.
8
Vcfanno: fast, flexible annotation of genetic variants.Vcfanno:基因变异的快速、灵活注释
Genome Biol. 2016 Jun 1;17(1):118. doi: 10.1186/s13059-016-0973-5.
9
parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants.parSMURF,一种用于全基因组致病性变异检测的高性能计算工具。
Gigascience. 2020 May 1;9(5). doi: 10.1093/gigascience/giaa052.
10
A comprehensive collection of annotations to interpret sequence variation in human mitochondrial transfer RNAs.用于解释人类线粒体转移RNA序列变异的注释综合集。
BMC Bioinformatics. 2016 Nov 8;17(Suppl 12):338. doi: 10.1186/s12859-016-1193-4.

引用本文的文献

1
Identification of shared genetic etiology of cardiovascular and cerebrovascular diseases through common cardiometabolic risk factors.通过常见的心脏代谢危险因素识别心血管疾病和脑血管疾病的共同遗传病因。
Commun Biol. 2024 Dec 27;7(1):1703. doi: 10.1038/s42003-024-07417-6.
2
Refining antipsychotic treatment strategies in schizophrenia: discovery of genetic biomarkers for enhanced drug response prediction.优化精神分裂症的抗精神病治疗策略:发现用于增强药物反应预测的基因生物标志物。
Mol Psychiatry. 2025 Jun;30(6):2362-2371. doi: 10.1038/s41380-024-02841-w. Epub 2024 Nov 19.
3
Best practices for germline variant and DNA methylation analysis of second- and third-generation sequencing data.

本文引用的文献

1
The mutational constraint spectrum quantified from variation in 141,456 humans.从 141456 名人类个体的变异中量化的突变约束谱。
Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.
2
The Medical Genome Initiative: moving whole-genome sequencing for rare disease diagnosis to the clinic.医学基因组计划:将罕见病诊断的全基因组测序推向临床。
Genome Med. 2020 May 27;12(1):48. doi: 10.1186/s13073-020-00748-z.
3
Genomic basis for RNA alterations in cancer.癌症中 RNA 改变的基因组基础。
种系变异和第二代及第三代测序数据 DNA 甲基化分析的最佳实践。
Hum Genomics. 2024 Nov 5;18(1):120. doi: 10.1186/s40246-024-00684-8.
4
Revealing the association between East Asian oral microbiome and colorectal cancer through Mendelian randomization and multi-omics analysis.通过孟德尔随机化和多组学分析揭示东亚口腔微生物群与结直肠癌之间的关联。
Front Cell Infect Microbiol. 2024 Sep 17;14:1452392. doi: 10.3389/fcimb.2024.1452392. eCollection 2024.
5
Identification and verification of disulfidptosis-related genes in sepsis-induced acute lung injury.脓毒症诱导的急性肺损伤中与二硫化物诱导细胞程序性坏死相关基因的鉴定与验证
Front Med (Lausanne). 2024 Aug 28;11:1430252. doi: 10.3389/fmed.2024.1430252. eCollection 2024.
6
TOPMed imputed genomics enhances genomic atlas of the human proteome in brain, cerebrospinal fluid, and plasma.TOPMed 基因组学增强了人类蛋白质组在大脑、脑脊液和血浆中的基因组图谱。
Sci Data. 2024 Apr 16;11(1):387. doi: 10.1038/s41597-024-03140-3.
7
A genome-wide association study based on the China Kadoorie Biobank identifies genetic associations between snoring and cardiometabolic traits.一项基于中国科罗拉多生物银行的全基因组关联研究确定了打鼾与心脏代谢特征之间的遗传关联。
Commun Biol. 2024 Mar 9;7(1):305. doi: 10.1038/s42003-024-05978-0.
8
From the reference human genome to human pangenome: Premise, promise and challenge.从参考人类基因组到人类泛基因组:前提、前景与挑战。
Front Genet. 2022 Nov 10;13:1042550. doi: 10.3389/fgene.2022.1042550. eCollection 2022.
9
FAVOR: functional annotation of variants online resource and annotator for variation across the human genome.FAVOR:在线变体功能注释资源和人类基因组变异注释器。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1300-D1311. doi: 10.1093/nar/gkac966.
10
QTLbase2: an enhanced catalog of human quantitative trait loci on extensive molecular phenotypes.QTLbase2:一个增强的人类数量性状基因座综合分子表型目录。
Nucleic Acids Res. 2023 Jan 6;51(D1):D1122-D1128. doi: 10.1093/nar/gkac1020.
Nature. 2020 Feb;578(7793):129-136. doi: 10.1038/s41586-020-1970-0. Epub 2020 Feb 5.
4
ClinVar: improvements to accessing data.ClinVar:访问数据的改进。
Nucleic Acids Res. 2020 Jan 8;48(D1):D835-D844. doi: 10.1093/nar/gkz972.
5
CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies.CAUSALdb:一个数据库,用于通过全基因组关联研究的汇总统计数据来识别疾病/特征因果变异。
Nucleic Acids Res. 2020 Jan 8;48(D1):D807-D816. doi: 10.1093/nar/gkz1026.
6
regBase: whole genome base-wise aggregation and functional prediction for human non-coding regulatory variants.regBase:人类非编码调控变异的全基因组碱基水平聚集和功能预测。
Nucleic Acids Res. 2019 Dec 2;47(21):e134. doi: 10.1093/nar/gkz774.
7
Mapping and Making Sense of Noncoding Mutations in the Genome.对基因组中非编码突变进行定位和解读。
Cancer Res. 2019 Sep 1;79(17):4309-4314. doi: 10.1158/0008-5472.CAN-19-0905. Epub 2019 Aug 6.
8
Augmented Interval List: a novel data structure for efficient genomic interval search.增强型区间列表:一种用于高效基因组区间搜索的新型数据结构。
Bioinformatics. 2019 Dec 1;35(23):4907-4911. doi: 10.1093/bioinformatics/btz407.
9
NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans.NCBoost 通过在人类中对净化选择信号进行监督学习,对孟德尔疾病中的致病性非编码变体进行分类。
Genome Biol. 2019 Feb 11;20(1):32. doi: 10.1186/s13059-019-1634-2.
10
An evolutionary framework for measuring epigenomic information and estimating cell-type-specific fitness consequences.一种用于测量表观基因组信息和估计细胞类型特异性适应度后果的进化框架。
Nat Genet. 2019 Feb;51(2):335-342. doi: 10.1038/s41588-018-0300-z. Epub 2018 Dec 17.