Syam Aditya, Adonizio Chris, Wei Xinzhu
Department of Computational Biology, Cornell University, Ithaca, NY.
Department of Mathematics, Cornell University, Ithaca, NY.
bioRxiv. 2025 Aug 20:2025.08.15.670378. doi: 10.1101/2025.08.15.670378.
The Genotype Representation Graph (GRG) [DeHaas et al., 2025] is a graph representation of whole genome polymorphisms, designed to encode the variant hard-call information in phased whole genomes. It encodes the genotypes as an extremely compact graph that can be traversed efficiently, enabling dynamic programming-style algorithms on applications such as genome-wide association studies that run faster on biobank-scale data than existing alternatives. To facilitate scalable statistical genetics, we present , an extremely fast phenotype simulator for GRGs, suitable for simulating phenotypes on biobank-scale datasets.
contains all the primary functionalities of a phenotype simulator, uses a standardized output, and supports customized simulations. is dozens to hundreds of times faster than [Tagami et al., 2024], a fast ancestral recombination graph-based phenotype simulator, when the sample size ranges from thousands to hundreds of thousands samples.
The GrgPhenoSim library and use-case demonstrations are available at https://github.com/aprilweilab/grg_pheno_simThe documentation for GrgPhenoSim is hosted at https://grgl.readthedocs.io/en/latest/index.html.
基因型表示图(GRG)[德哈斯等人,2025年]是全基因组多态性的一种图形表示,旨在对分阶段全基因组中的变异硬调用信息进行编码。它将基因型编码为一个极其紧凑的图形,可以高效遍历,从而在全基因组关联研究等应用中实现动态规划风格的算法,在生物样本库规模的数据上运行速度比现有替代方法更快。为了促进可扩展的统计遗传学发展,我们提出了GrgPhenoSim,这是一种用于GRG的极快速表型模拟器,适用于在生物样本库规模的数据集上模拟表型。
GrgPhenoSim包含表型模拟器的所有主要功能,使用标准化输出,并支持定制模拟。当样本量从数千个到数十万个样本时,GrgPhenoSim比基于快速祖先重组图的表型模拟器Tagami[田上等人,2024年]快几十到几百倍。
GrgPhenoSim库和用例演示可在https://github.com/aprilweilab/grg_pheno_sim获取。GrgPhenoSim的文档托管在https://grgl.readthedocs.io/en/latest/index.html。