Suppr超能文献

PGsim:一款全面且高度可定制的个人基因组模拟器。

PGsim: A Comprehensive and Highly Customizable Personal Genome Simulator.

作者信息

Juan Liran, Wang Yongtian, Jiang Jingyi, Yang Qi, Jiang Qinghua, Wang Yadong

机构信息

School of Life Science and Technology, Harbin Institute of Technology, Harbin, China.

School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.

出版信息

Front Bioeng Biotechnol. 2020 Jan 28;8:28. doi: 10.3389/fbioe.2020.00028. eCollection 2020.

Abstract

Although genome sequencing has become increasingly popular, the simulation of individual genomes is still important. This is because sequencing a large number of individual genomes is costly and genome data with extreme and boundary conditions, such as fatal genetic defects, are difficult to obtain. Privacy and legal barriers also prevent many applications of real data. Large sequencing projects in recent years have provided a deeper understanding of the human genome. However, there is a lack of tools to leverage known data to simulate personal genomes as real as possible. Here, we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed. As a Perl script, there is no obstacle to running PGsim on any version of the MAC OS or Linux systems, and no libraries, packages, interpreters, compilers, or other dependencies need to be installed in advance. The PGsim tool is publicly available at https://github.com/lrjuan/PGsim.

摘要

尽管基因组测序越来越受欢迎,但个体基因组的模拟仍然很重要。这是因为对大量个体基因组进行测序成本高昂,而且诸如致命遗传缺陷等具有极端和边界条件的基因组数据很难获得。隐私和法律障碍也阻碍了真实数据的许多应用。近年来的大型测序项目让人们对人类基因组有了更深入的了解。然而,缺乏利用已知数据来尽可能逼真地模拟个人基因组的工具。在此,我们设计并开发了PGsim,这是一款全面且高度可定制的个体基因组模拟器,它充分利用了现有知识,如全球或世界主要人群中的变异等位基因频率、蛋白质编码区和非编码区之间的突变概率差异、转换/颠换(Ti/Tv)比率、插入缺失发生率、插入缺失长度分布、结构变异位点和致病突变位点。用户可以通过详细的参数设置灵活控制已知变异、常见变异、编码区和非编码区的新变异以及特殊变异的比例和数量。为确保模拟的个人基因组具有足够的随机性,PGsim在变异分布、比例和群体特征方面使生成的变异更加真实可靠。PGsim能够使用海量数据库作为背景数据来模拟个人基因组,并且不需要SQL数据库支持。用户可以根据需要轻松更改所使用的变异数据库。作为一个Perl脚本,在任何版本的MAC OS或Linux系统上运行PGsim都没有障碍,并且无需预先安装库、包编译器或其他依赖项。PGsim工具可在https://github.com/lrjuan/PGsim上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4e21/6997238/bab9ddd6067b/fbioe-08-00028-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验