Guardado Miguel, Perez Cynthia, Campana Sthen, Chavez Rojas Berenice, Magaña Joaquín, Jackson Shalom, Samperio Emily, Hernandez Selena, Syas Kaela, Hernandez Ryan D, Zavala Elena I, Rohlfs Rori V
Department of Mathematics, San Francisco State University, San Francisco, CA, 94132, USA.
Biological and Medical Informatics Graduate Program, University of California San Francisco, San Francisco, CA, 94158, USA.
BMC Bioinformatics. 2025 May 7;26(1):122. doi: 10.1186/s12859-025-06142-z.
Large-scale family pedigrees are commonly used across medical, evolutionary, and forensic genetics. These pedigrees are tools for identifying genetic disorders, tracking evolutionary patterns, and establishing familial relationships via forensic genetic identification. However, there is a lack of software to accurately simulate different pedigree structures along with genomes corresponding to those individuals in a family pedigree. This limits simulation-based evaluations of methods that use pedigrees.
We have developed a python command-line-based tool called py_ped_sim that facilitates the simulation of pedigree structures and the genomes of individuals in a pedigree. py_ped_sim represents pedigrees as directed acyclic graphs, enabling conversion between standard pedigree formats and integration with the forward population genetic simulator, SLiM. Notably, py_ped_sim allows the simulation of varying numbers of offspring for a set of parents, with the capacity to shift the distribution of sibship sizes over generations. We additionally add simulations for events of misattributed paternity, which offers a way to simulate half-sibling relationships, and simulations to extend the breadth of a family pedigree. We validated the accuracy of both our genome simulator and pedigree simulator. We show that we can simulate genomes onto family pedigrees with levels of expected kinship.
py_ped_sim is a user-friendly and open-source solution for simulating pedigree structures and conducting pedigree genome simulations. It empowers medical, forensic, and evolutionary genetics researchers to gain deeper insights into the dynamics of genetic inheritance and relatedness within families.
大规模家系图谱在医学、进化和法医遗传学中普遍使用。这些家系图谱是用于识别遗传疾病、追踪进化模式以及通过法医基因鉴定确定家族关系的工具。然而,缺乏能够准确模拟不同家系结构以及与家系中个体相对应的基因组的软件。这限制了对使用家系图谱的方法进行基于模拟的评估。
我们开发了一个基于Python命令行的工具,名为py_ped_sim,它有助于模拟家系结构和家系中个体的基因组。py_ped_sim将家系表示为有向无环图,实现了标准家系格式之间的转换以及与正向群体遗传模拟器SLiM的集成。值得注意的是,py_ped_sim允许为一组父母模拟不同数量的后代,能够改变世代间同胞大小分布。我们还增加了对父子关系误判事件的模拟,这提供了一种模拟半同胞关系的方法,以及扩展家系图谱广度的模拟。我们验证了我们的基因组模拟器和家系模拟器的准确性。我们表明,我们能够以预期的亲缘关系水平将基因组模拟到家系图谱上。
py_ped_sim是一个用户友好的开源解决方案,用于模拟家系结构和进行家系基因组模拟。它使医学、法医和进化遗传学研究人员能够更深入地了解家族内遗传遗传和相关性的动态。