Suppr超能文献

DisVar:一个用于利用大规模个人基因信息识别与疾病相关变异的R语言库。

DisVar: an R library for identifying variants associated with diseases using large-scale personal genetic information.

作者信息

Chanasongkhram Khunanon, Damkliang Kasikrit, Sangket Unitsa

机构信息

Division of Biological Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand.

Division of Computational Science, Faculty of Science, Prince of Songkla University, Hat Yai, Songkhla, Thailand.

出版信息

PeerJ. 2023 Sep 28;11:e16086. doi: 10.7717/peerj.16086. eCollection 2023.

Abstract

BACKGROUND

Genetic variants may potentially play a contributing factor in the development of diseases. Several genetic disease databases are used in medical research and diagnosis but the web applications used to search these databases for disease-associated variants have limitations. The application may not be able to search for large-scale genetic variants, the results of searches may be difficult to interpret and variants mapped from the latest reference genome (GRCH38/hg38) may not be supported.

METHODS

In this study, we developed a novel R library called "DisVar" to identify disease-associated genetic variants in large-scale individual genomic data. This R library is compatible with variants from the latest reference genome version. DisVar uses five databases of disease-associated variants. Over 100 million variants can be simultaneously searched for specific associated diseases.

RESULTS

The package was evaluated using 24 Variant Call Format (VCF) files (215,054 to 11,346,899 sites) from the 1000 Genomes Project. Disease-associated variants were detected in 298,227 hits across all the VCF files, taking a total of 63.58 m to complete. The package was also tested on ClinVar's VCF file (2,120,558 variants), where 20,657 hits associated with diseases were identified with an estimated elapsed time of 45.98 s.

CONCLUSIONS

DisVar can overcome the limitations of existing tools and is a fast and effective diagnostic and preventive tool that identifies disease-associated variations from large-scale genetic variants against the latest reference genome.

摘要

背景

基因变异可能在疾病发展中发挥促成作用。医学研究和诊断中使用了多个遗传疾病数据库,但用于在这些数据库中搜索疾病相关变异的网络应用存在局限性。该应用可能无法搜索大规模基因变异,搜索结果可能难以解释,并且可能不支持从最新参考基因组(GRCH38/hg38)映射的变异。

方法

在本研究中,我们开发了一个名为“DisVar”的新型R库,用于在大规模个体基因组数据中识别疾病相关的基因变异。这个R库与最新参考基因组版本的变异兼容。DisVar使用五个疾病相关变异数据库。可以同时搜索超过1亿个变异以查找特定的相关疾病。

结果

使用来自千人基因组计划的24个变异调用格式(VCF)文件(215,054至11,346,899个位点)对该软件包进行了评估。在所有VCF文件中的298,227次命中中检测到了疾病相关变异,总共耗时63.58分钟完成。该软件包还在ClinVar的VCF文件(2,120,558个变异)上进行了测试,在该文件中识别出了20,657个与疾病相关的命中,估计耗时45.98秒。

结论

DisVar可以克服现有工具的局限性,是一种快速有效的诊断和预防工具,可根据最新参考基因组从大规模基因变异中识别疾病相关变异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/766c/10542659/84454fde5250/peerj-11-16086-g001.jpg

相似文献

3
VCF-Miner: GUI-based application for mining variants and annotations stored in VCF files.
Brief Bioinform. 2016 Mar;17(2):346-51. doi: 10.1093/bib/bbv051. Epub 2015 Jul 25.
5
VCF-Server: A web-based visualization tool for high-throughput variant data mining and management.
Mol Genet Genomic Med. 2019 Jul;7(7):e00641. doi: 10.1002/mgg3.641. Epub 2019 May 24.
6
Improved VCF normalization for accurate VCF comparison.
Bioinformatics. 2017 Apr 1;33(7):964-970. doi: 10.1093/bioinformatics/btw748.
7
SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations.
Genet Epidemiol. 2015 Dec;39(8):619-23. doi: 10.1002/gepi.21918. Epub 2015 Sep 23.
8
Variant Tool Chest: an improved tool to analyze and manipulate variant call format (VCF) files.
BMC Bioinformatics. 2014;15 Suppl 7(Suppl 7):S12. doi: 10.1186/1471-2105-15-S7-S12. Epub 2014 May 28.
9
gSearch: a fast and flexible general search tool for whole-genome sequencing.
Bioinformatics. 2012 Aug 15;28(16):2176-7. doi: 10.1093/bioinformatics/bts358. Epub 2012 Jun 23.
10
WhopGenome: high-speed access to whole-genome variation and sequence data in R.
Bioinformatics. 2015 Feb 1;31(3):413-5. doi: 10.1093/bioinformatics/btu636. Epub 2014 Oct 1.

引用本文的文献

本文引用的文献

2
MutationTaster2021.
Nucleic Acids Res. 2021 Jul 2;49(W1):W446-W451. doi: 10.1093/nar/gkab266.
3
The influence of evolutionary history on human health and disease.
Nat Rev Genet. 2021 May;22(5):269-283. doi: 10.1038/s41576-020-00305-9. Epub 2021 Jan 6.
5
Benefits and limitations of genome-wide association studies.
Nat Rev Genet. 2019 Aug;20(8):467-484. doi: 10.1038/s41576-019-0127-1.
6
The genetic basis of disease.
Essays Biochem. 2018 Dec 2;62(5):643-723. doi: 10.1042/EBC20170053. Print 2018 Dec 3.
9
A global reference for human genetic variation.
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
10
SNPer: an R library for quantitative variant analysis on single nucleotide polymorphisms among influenza virus populations.
PLoS One. 2015 Apr 13;10(4):e0122812. doi: 10.1371/journal.pone.0122812. eCollection 2015.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验