• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

强大且快速的算法有助于在综合框架中进行大规模全基因组测序的下游分析。

Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.

作者信息

Li Miaoxin, Li Jiang, Li Mulin Jun, Pan Zhicheng, Hsu Jacob Shujui, Liu Dajiang J, Zhan Xiaowei, Wang Junwen, Song Youqiang, Sham Pak Chung

机构信息

Department of Medical Genetics, Center for Genome Research, Center for Precision Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, 510080, China.

The Centre for Genomic Sciences, the University of Hong Kong, Pokfulam, Hong Kong.

出版信息

Nucleic Acids Res. 2017 May 19;45(9):e75. doi: 10.1093/nar/gkx019.

DOI:10.1093/nar/gkx019
PMID:28115622
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5435951/
Abstract

Whole genome sequencing (WGS) is a promising strategy to unravel variants or genes responsible for human diseases and traits. However, there is a lack of robust platforms for a comprehensive downstream analysis. In the present study, we first proposed three novel algorithms, sequence gap-filled gene feature annotation, bit-block encoded genotypes and sectional fast access to text lines to address three fundamental problems. The three algorithms then formed the infrastructure of a robust parallel computing framework, KGGSeq, for integrating downstream analysis functions for whole genome sequencing data. KGGSeq has been equipped with a comprehensive set of analysis functions for quality control, filtration, annotation, pathogenic prediction and statistical tests. In the tests with whole genome sequencing data from 1000 Genomes Project, KGGSeq annotated several thousand more reliable non-synonymous variants than other widely used tools (e.g. ANNOVAR and SNPEff). It took only around half an hour on a small server with 10 CPUs to access genotypes of ∼60 million variants of 2504 subjects, while a popular alternative tool required around one day. KGGSeq's bit-block genotype format used 1.5% or less space to flexibly represent phased or unphased genotypes with multiple alleles and achieved a speed of over 1000 times faster to calculate genotypic correlation.

摘要

全基因组测序(WGS)是一种很有前景的策略,可用于揭示导致人类疾病和性状的变异或基因。然而,目前缺乏强大的平台来进行全面的下游分析。在本研究中,我们首先提出了三种新颖的算法,即序列缺口填充基因特征注释、位块编码基因型和文本行的分段快速访问,以解决三个基本问题。这三种算法随后构成了一个强大的并行计算框架KGGSeq的基础架构,用于整合全基因组测序数据的下游分析功能。KGGSeq配备了一套全面的分析功能,用于质量控制、过滤、注释、致病性预测和统计测试。在对来自千人基因组计划的全基因组测序数据进行测试时,KGGSeq注释的可靠非同义变异比其他广泛使用的工具(如ANNOVAR和SNPEff)多出数千个。在一台配备10个CPU的小型服务器上,只需大约半小时就能获取2504名受试者约6000万个变异的基因型,而另一个常用工具则需要大约一天时间。KGGSeq的位块基因型格式使用的空间不到1.5%,能够灵活地表示具有多个等位基因的分阶段或未分阶段基因型,并且计算基因型相关性的速度快了1000倍以上。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/4a16e1e029ff/gkx019fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/c1a85650f6e5/gkx019fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/2e9a4aa3985c/gkx019fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/e6794b50d404/gkx019fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/4a16e1e029ff/gkx019fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/c1a85650f6e5/gkx019fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/2e9a4aa3985c/gkx019fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/e6794b50d404/gkx019fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6518/5435951/4a16e1e029ff/gkx019fig4.jpg

相似文献

1
Robust and rapid algorithms facilitate large-scale whole genome sequencing downstream analysis in an integrative framework.强大且快速的算法有助于在综合框架中进行大规模全基因组测序的下游分析。
Nucleic Acids Res. 2017 May 19;45(9):e75. doi: 10.1093/nar/gkx019.
2
Tumour procurement, DNA extraction, coverage analysis and optimisation of mutation-detection algorithms for human melanoma genomes.人类黑色素瘤基因组的肿瘤样本采集、DNA提取、覆盖度分析以及突变检测算法的优化。
Pathology. 2015 Dec;47(7):683-93. doi: 10.1097/PAT.0000000000000324.
3
Cancer whole-genome sequencing: present and future.癌症全基因组测序:现状与未来。
Oncogene. 2015 Dec 3;34(49):5943-50. doi: 10.1038/onc.2015.90. Epub 2015 Mar 30.
4
Do it yourself guide to genome assembly.基因组组装自助指南。
Brief Funct Genomics. 2016 Jan;15(1):1-9. doi: 10.1093/bfgp/elu042. Epub 2014 Nov 11.
5
Faster single-end alignment generation utilizing multi-thread for BWA.利用多线程实现更快的BWA单端比对生成。
Biomed Mater Eng. 2015;26 Suppl 1:S1791-6. doi: 10.3233/BME-151480.
6
Haplotype-Phased Synthetic Long Reads from Short-Read Sequencing.基于短读长测序的单倍型分型合成长读段
PLoS One. 2016 Jan 20;11(1):e0147229. doi: 10.1371/journal.pone.0147229. eCollection 2016.
7
UMD-Predictor: A High-Throughput Sequencing Compliant System for Pathogenicity Prediction of any Human cDNA Substitution.UMD预测器:一种适用于任何人类cDNA替换致病性预测的高通量测序兼容系统。
Hum Mutat. 2016 May;37(5):439-46. doi: 10.1002/humu.22965. Epub 2016 Feb 22.
8
Comparison among three variant callers and assessment of the accuracy of imputation from SNP array data to whole-genome sequence level in chicken.鸡中三种变异检测工具的比较以及从SNP芯片数据到全基因组序列水平的填充准确性评估。
BMC Genomics. 2015 Oct 21;16:824. doi: 10.1186/s12864-015-2059-2.
9
Methodological aspects of whole-genome bisulfite sequencing analysis.全基因组亚硫酸氢盐测序分析的方法学方面
Brief Bioinform. 2015 May;16(3):369-79. doi: 10.1093/bib/bbu016. Epub 2014 May 27.
10
An integrative approach for efficient analysis of whole genome bisulfite sequencing data.一种用于全基因组亚硫酸氢盐测序数据高效分析的综合方法。
BMC Genomics. 2015;16 Suppl 12(Suppl 12):S14. doi: 10.1186/1471-2164-16-S12-S14. Epub 2015 Dec 9.

引用本文的文献

1
β-Actin Deficiency in Baraitser-Winter Syndrome Type 1 Disrupts T-Cell Function and Immune Regulation: Implications for Targeted Therapy in Actinopathies.1型巴赖特-温特综合征中的β-肌动蛋白缺乏会破坏T细胞功能和免疫调节:对肌动蛋白病靶向治疗的启示。
J Clin Immunol. 2025 Aug 1;45(1):120. doi: 10.1007/s10875-025-01906-x.
2
Multi-omics insights into the molecular signature and prognosis of hypopharyngeal squamous cell carcinoma.下咽鳞状细胞癌分子特征与预后的多组学见解
Commun Biol. 2025 Mar 5;8(1):370. doi: 10.1038/s42003-025-07700-0.
3
Variant ranking pipeline for complex familial disorders.

本文引用的文献

1
M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity.M-CAP 以高灵敏度消除临床外显子组中大多数意义不明的变异。
Nat Genet. 2016 Dec;48(12):1581-1586. doi: 10.1038/ng.3703. Epub 2016 Oct 24.
2
REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants.REVEL:一种预测罕见错义变异致病性的集成方法。
Am J Hum Genet. 2016 Oct 6;99(4):877-885. doi: 10.1016/j.ajhg.2016.08.016. Epub 2016 Sep 22.
3
Inheritance-mode specific pathogenicity prioritization (ISPP) for human protein coding genes.
复杂家族性疾病变异排序流程。
Sci Rep. 2024 Jun 13;14(1):13599. doi: 10.1038/s41598-024-64169-3.
4
Identification of potential genetic Loci and polygenic risk model for Budd-Chiari syndrome in Chinese population.中国人群布加综合征潜在基因位点及多基因风险模型的鉴定
iScience. 2023 Jul 11;26(8):107287. doi: 10.1016/j.isci.2023.107287. eCollection 2023 Aug 18.
5
Applications of genomic research in pediatric endocrine diseases.基因组研究在儿科内分泌疾病中的应用。
Clin Exp Pediatr. 2023 Dec;66(12):520-530. doi: 10.3345/cep.2022.00948. Epub 2023 Jun 14.
6
GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species.GBC:一种基于高度可寻址字节编码块的并行工具包,用于处理物种的超大规模基因型。
Genome Biol. 2023 Apr 17;24(1):76. doi: 10.1186/s13059-023-02906-z.
7
Dominant-negative signal transducer and activator of transcription (STAT)3 variants in adult patients: A single center experience.成人患者中显性负性信号转导子和转录激活子(STAT)3 变异体:单中心经验。
Front Immunol. 2022 Dec 20;13:1044933. doi: 10.3389/fimmu.2022.1044933. eCollection 2022.
8
Computational approaches for predicting variant impact: An overview from resources, principles to applications.预测变异影响的计算方法:从资源、原理到应用的概述
Front Genet. 2022 Sep 29;13:981005. doi: 10.3389/fgene.2022.981005. eCollection 2022.
9
Defective binding of ETS1 and STAT4 due to a mutation in the promoter region of as a novel mechanism of congenital amegakaryocytic thrombocytopenia.ETS1 和 STAT4 因启动子区域突变导致结合缺陷,这是先天性巨核细胞血小板减少症的一种新机制。
Haematologica. 2023 May 1;108(5):1385-1393. doi: 10.3324/haematol.2022.281392.
10
Rare Variants in Inborn Errors of Immunity Genes Associated With Covid-19 Severity.与新冠病毒严重程度相关的先天性免疫基因罕见变异。
Front Cell Infect Microbiol. 2022 May 27;12:888582. doi: 10.3389/fcimb.2022.888582. eCollection 2022.
遗传模式特异性致病性优先级排序(ISPP)用于人类蛋白质编码基因。
Bioinformatics. 2016 Oct 15;32(20):3065-3071. doi: 10.1093/bioinformatics/btw381. Epub 2016 Jun 26.
4
Predicting regulatory variants with composite statistic.使用复合统计量预测调控变异体。
Bioinformatics. 2016 Sep 15;32(18):2729-36. doi: 10.1093/bioinformatics/btw288. Epub 2016 Jun 6.
5
RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data.RVTESTS:一种使用序列数据进行罕见变异关联分析的高效且全面的工具。
Bioinformatics. 2016 May 1;32(9):1423-6. doi: 10.1093/bioinformatics/btw079. Epub 2016 Feb 15.
6
A specific mutation in TBL1XR1 causes Pierpont syndrome.TBL1XR1基因的一种特定突变会导致皮尔庞特综合征。
J Med Genet. 2016 May;53(5):330-7. doi: 10.1136/jmedgenet-2015-103233. Epub 2016 Jan 14.
7
Multiallelic Positions in the Human Genome: Challenges for Genetic Analyses.人类基因组中的多等位基因位点:遗传分析面临的挑战
Hum Mutat. 2016 Mar;37(3):231-234. doi: 10.1002/humu.22944. Epub 2015 Dec 31.
8
dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice-Site SNVs.dbNSFP v3.0:一个用于人类非同义突变和剪接位点单核苷酸变异的功能预测与注释一站式数据库。
Hum Mutat. 2016 Mar;37(3):235-41. doi: 10.1002/humu.22932. Epub 2016 Jan 5.
9
Efficient genotype compression and analysis of large genetic-variation data sets.大型基因变异数据集的高效基因型压缩与分析
Nat Methods. 2016 Jan;13(1):63-5. doi: 10.1038/nmeth.3654. Epub 2015 Nov 9.
10
BGT: efficient and flexible genotype query across many samples.BGT:跨多个样本进行高效灵活的基因型查询。
Bioinformatics. 2016 Feb 15;32(4):590-2. doi: 10.1093/bioinformatics/btv613. Epub 2015 Oct 24.