Department of Genome Sciences, The University of Washington, Seattle, WA 98195, USA.
Genet Epidemiol. 2013 Feb;37(2):136-41. doi: 10.1002/gepi.21684. Epub 2012 Sep 19.
Many statistical analyses of genetic data rely on the assumption of independence among samples. Consequently, relatedness is either modeled in the analysis or samples are removed to "clean" the data of any pairwise relatedness above a tolerated threshold. Current methods do not maximize the number of unrelated individuals retained for further analysis, and this is a needless loss of resources. We report a novel application of graph theory that identifies the maximum set of unrelated samples in any dataset given a user-defined threshold of relatedness as well as all networks of related samples. We have implemented this method into an open source program called Pedigree Reconstruction and Identification of a Maximum Unrelated Set, PRIMUS. We show that PRIMUS outperforms the three existing methods, allowing researchers to retain up to 50% more unrelated samples. A unique strength of PRIMUS is its ability to weight the maximum clique selection using additional criteria (e.g. affected status and data missingness). PRIMUS is a permanent solution to identifying the maximum number of unrelated samples for a genetic analysis.
许多遗传数据分析都依赖于样本之间独立性的假设。因此,要么在分析中对亲缘关系进行建模,要么将样本移除以“清理”数据中超过可容忍阈值的任何成对亲缘关系。当前的方法并没有最大化保留用于进一步分析的无关个体的数量,这是一种不必要的资源浪费。我们报告了图论的一种新应用,该应用可以在给定用户定义的亲缘关系阈值以及所有相关样本网络的情况下,确定任何数据集的最大无关样本集。我们已经将这种方法实现到一个名为“Pedigree Reconstruction and Identification of a Maximum Unrelated Set”(PRIMUS)的开源程序中。我们表明,PRIMUS 优于现有的三种方法,允许研究人员保留多达 50%的更多无关样本。PRIMUS 的一个独特优势是,它能够使用其他标准(例如受影响状态和数据缺失)对最大团选择进行加权。PRIMUS 是确定遗传分析中最大数量无关样本的永久解决方案。