Leal Thiago Peixoto, Furlan Vinicius C, Gouveia Mateus Henrique, Saraiva Duarte Julia Maria, Fonseca Pablo As, Tou Rafael, Scliar Marilia de Oliveira, Araujo Gilderlanio Santana de, Costa Lucas F, Zolini Camila, Peixoto Maria Gabriela Campolina Diniz, Carvalho Maria Raquel Santos, Lima-Costa Maria Fernanda, Gilman Robert H, Tarazona-Santos Eduardo, Rodrigues Maíra Ribeiro
Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil.
Lerner Research Institute, Genomic Medicine, Cleveland Clinic, Cleveland, OH, United States.
Comput Struct Biotechnol J. 2022 Apr 9;20:1821-1828. doi: 10.1016/j.csbj.2022.04.009. eCollection 2022.
Genetic and omics analyses frequently require independent observations, which is not guaranteed in real datasets. When relatedness cannot be accounted for, solutions involve removing related individuals (or observations) and, consequently, a reduction of available data. We developed a network-based relatedness-pruning method that minimizes dataset reduction while removing unwanted relationships in a dataset. It uses node degree centrality metric to identify highly connected nodes (or individuals) and implements heuristics that approximate the minimal reduction of a dataset to allow its application to complex datasets. When compared with two other popular population genetics methodologies (PLINK and KING), NAToRA shows the best combination of removing all relatives while keeping the largest possible number of individuals in all datasets tested and also, with similar effects on the allele frequency spectrum and Principal Component Analysis than PLINK and KING. NAToRA is freely available, both as a standalone tool that can be easily incorporated as part of a pipeline, and as a graphical web tool that allows visualization of the relatedness networks. NAToRA also accepts a variety of relationship metrics as input, which facilitates its use. We also release a genealogies simulator software used for different tests performed in this study.
遗传分析和组学分析常常需要独立观测值,但实际数据集中并非总能保证这一点。当无法考虑亲缘关系时,解决办法包括剔除相关个体(或观测值),结果是可用数据减少。我们开发了一种基于网络的亲缘关系修剪方法,该方法在剔除数据集中不必要的关系时,能将数据集的减少程度降至最低。它使用节点度中心性指标来识别高度连接的节点(或个体),并实施启发式算法,近似最小化数据集的减少,以使其能应用于复杂数据集。与其他两种流行的群体遗传学方法(PLINK和KING)相比,NAToRA在所有测试数据集中,展现出在剔除所有亲属关系的同时保留尽可能多个体的最佳组合,并且在等位基因频率谱和主成分分析方面,与PLINK和KING具有相似的效果。NAToRA既可以作为一个可轻松整合到流程中的独立工具免费获取,也可以作为一个能可视化亲缘关系网络的图形化网络工具免费获取。NAToRA还接受多种关系指标作为输入,这便于其使用。我们还发布了用于本研究中不同测试的系谱模拟器软件。