Suppr超能文献

评估人类遗传学中的抽样选择偏差:一种现象学方法。

Estimating Sampling Selection Bias in Human Genetics: A Phenomenological Approach.

作者信息

Risso Davide, Taglioli Luca, De Iasio Sergio, Gueresi Paola, Alfani Guido, Nelli Sergio, Rossi Paolo, Paoli Giorgio, Tofanelli Sergio

机构信息

National Institute on Deafness and Other Communication Disorders, NIH, Bethesda, MD 20854, United States of America; Laboratory of Molecular Anthropology and Centre for Genome Biology, Department of BiGeA, University of Bologna, via Selmi 3, 40126 Bologna, Italy.

Dipartimento di Biologia, University of Pisa, Via Ghini 13, 56126 Pisa, Italy.

出版信息

PLoS One. 2015 Oct 9;10(10):e0140146. doi: 10.1371/journal.pone.0140146. eCollection 2015.

Abstract

This research is the first empirical attempt to calculate the various components of the hidden bias associated with the sampling strategies routinely-used in human genetics, with special reference to surname-based strategies. We reconstructed surname distributions of 26 Italian communities with different demographic features across the last six centuries (years 1447-2001). The degree of overlapping between "reference founding core" distributions and the distributions obtained from sampling the present day communities by probabilistic and selective methods was quantified under different conditions and models. When taking into account only one individual per surname (low kinship model), the average discrepancy was 59.5%, with a peak of 84% by random sampling. When multiple individuals per surname were considered (high kinship model), the discrepancy decreased by 8-30% at the cost of a larger variance. Criteria aimed at maximizing locally-spread patrilineages and long-term residency appeared to be affected by recent gene flows much more than expected. Selection of the more frequent family names following low kinship criteria proved to be a suitable approach only for historically stable communities. In any other case true random sampling, despite its high variance, did not return more biased estimates than other selective methods. Our results indicate that the sampling of individuals bearing historically documented surnames (founders' method) should be applied, especially when studying the male-specific genome, to prevent an over-stratification of ancient and recent genetic components that heavily biases inferences and statistics.

摘要

本研究首次通过实证尝试计算与人类遗传学中常规使用的抽样策略相关的隐性偏差的各个组成部分,特别提及基于姓氏的策略。我们重建了过去六个世纪(1447 - 2001年)中26个具有不同人口特征的意大利社区的姓氏分布。在不同条件和模型下,对“参考奠基核心”分布与通过概率和选择性方法对当今社区进行抽样所获得的分布之间的重叠程度进行了量化。当每个姓氏仅考虑一个个体时(低亲缘关系模型),平均差异为59.5%,随机抽样时峰值为84%。当每个姓氏考虑多个个体时(高亲缘关系模型),差异以更大的方差为代价降低了8 - 30%。旨在最大化本地传播的父系血统和长期居住的标准似乎比预期更容易受到近期基因流动的影响。事实证明,按照低亲缘关系标准选择更常见的姓氏仅适用于历史上稳定的社区。在任何其他情况下,真正的随机抽样尽管方差很大,但与其他选择性方法相比,并不会产生更有偏差的估计。我们的结果表明,应采用对有历史记录姓氏的个体进行抽样(奠基者方法),特别是在研究男性特异性基因组时,以防止古代和近期遗传成分过度分层,从而严重影响推断和统计结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6aad/4599962/87d6713a361a/pone.0140146.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验