使用潜在变量方法表征法医DNA数据库的遗传结构。

Department of Mathematics, VU University, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands.

Forensic Sci Int Genet. 2016 Jul;23:130-149. doi: 10.1016/j.fsigen.2016.03.007. Epub 2016 Apr 1.

Several problems in forensic genetics require a representative model of a forensic DNA database. Obtaining an accurate representation of the offender database can be difficult, since databases typically contain groups of persons with unregistered ethnic origins in unknown proportions. We propose to estimate the allele frequencies of the subpopulations comprising the offender database and their proportions from the database itself using a latent variable approach. We present a model for which parameters can be estimated using the expectation maximization (EM) algorithm. This approach does not rely on relatively small and possibly unrepresentative population surveys, but is driven by the actual genetic composition of the database only. We fit the model to a snapshot of the Dutch offender database (2014), which contains close to 180,000 profiles, and find that three subpopulations suffice to describe a large fraction of the heterogeneity in the database. We demonstrate the utility and reliability of the approach with three applications. First, we use the model to predict the number of false leads obtained in database searches. We assess how well the model predicts the number of false leads obtained in mock searches in the Dutch offender database, both for the case of familial searching for first degree relatives of a donor and searching for contributors to three-person mixtures. Second, we study the degree of partial matching between all pairs of profiles in the Dutch database and compare this to what is predicted using the latent variable approach. Third, we use the model to provide evidence to support that the Dutch practice of estimating match probabilities using the Balding-Nichols formula with a native Dutch reference database and θ=0.03 is conservative.

法医遗传学中的几个问题需要一个法医DNA数据库的代表性模型。由于数据库通常包含种族来源未登记且比例未知的人群组，因此获得犯罪者数据库的准确代表性可能很困难。我们建议使用潜在变量方法从数据库本身估计构成犯罪者数据库的亚群的等位基因频率及其比例。我们提出了一个模型，其参数可以使用期望最大化（EM）算法进行估计。这种方法不依赖于相对较小且可能不具代表性的人群调查，而是仅由数据库的实际基因组成驱动。我们将该模型应用于荷兰犯罪者数据库（2014年）的一个快照，该数据库包含近180,000个档案，并发现三个亚群足以描述数据库中很大一部分的异质性。我们通过三个应用展示了该方法的实用性和可靠性。首先，我们使用该模型预测数据库搜索中获得的错误线索数量。我们评估该模型在荷兰犯罪者数据库模拟搜索中预测错误线索数量的效果，这两种情况分别是家族性搜索捐赠者的一级亲属以及搜索三人混合样本的贡献者。其次，我们研究荷兰数据库中所有档案对之间的部分匹配程度，并将其与使用潜在变量方法预测的结果进行比较。第三，我们使用该模型提供证据支持荷兰使用带有荷兰本土参考数据库且θ = 0.03的Balding-Nichols公式估计匹配概率的做法是保守的。

相似文献

Characterizing the genetic structure of a forensic DNA database using a latent variable approach.

Forensic Sci Int Genet. 2016 Jul;23:130-149. doi: 10.1016/j.fsigen.2016.03.007. Epub 2016 Apr 1.

Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

Forensic Sci Int Genet. 2014 Jan;8(1):1-9. doi: 10.1016/j.fsigen.2013.07.004. Epub 2013 Sep 7.

Use of sibling pairs to determine the familial searching efficiency of forensic databases.

Forensic Sci Int Genet. 2008 Sep;2(4):340-2. doi: 10.1016/j.fsigen.2008.04.008. Epub 2008 Jun 12.

Familial searching on DNA mixtures with dropout.

Forensic Sci Int Genet. 2016 May;22:128-138. doi: 10.1016/j.fsigen.2016.02.002. Epub 2016 Feb 22.

Validation of SmartRank: A likelihood ratio software for searching national DNA databases with complex DNA profiles.

Forensic Sci Int Genet. 2017 Jul;29:145-153. doi: 10.1016/j.fsigen.2017.04.008. Epub 2017 Apr 15.

Evaluating DNA evidence in a genetically complex population.

Forensic Sci Int Genet. 2018 Sep;36:141-147. doi: 10.1016/j.fsigen.2018.06.019. Epub 2018 Jun 28.

Partial matches in heterogeneous offender databases do not call into question the validity of random match probability calculations.

Int J Legal Med. 2009 Jan;123(1):59-63. doi: 10.1007/s00414-008-0239-1. Epub 2008 May 6.

The successful use of familial searching in six Hungarian high profile cases by applying a new module in Familias 3.

Forensic Sci Int Genet. 2016 Sep;24:24-32. doi: 10.1016/j.fsigen.2016.05.012. Epub 2016 May 19.

Fitting the Balding-Nichols model to forensic databases.

Forensic Sci Int Genet. 2015 Nov;19:86-91. doi: 10.1016/j.fsigen.2015.05.005. Epub 2015 Jun 23.

Familial identification: population structure and relationship distinguishability.

PLoS Genet. 2012 Feb;8(2):e1002469. doi: 10.1371/journal.pgen.1002469. Epub 2012 Feb 9.

引用本文的文献

An Upper Bound on the Power of DNA to Distinguish Pedigree Relationships.

Genes (Basel). 2025 Apr 26;16(5):492. doi: 10.3390/genes16050492.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Characterizing the genetic structure of a forensic DNA database using a latent variable approach.

Forensic Sci Int Genet. 2016 Jul;23:130-149. doi: 10.1016/j.fsigen.2016.03.007. Epub 2016 Apr 1.

Familial searching: a specialist forensic DNA profiling service utilising the National DNA Database to identify unknown offenders via their relatives--the UK experience.

Forensic Sci Int Genet. 2014 Jan;8(1):1-9. doi: 10.1016/j.fsigen.2013.07.004. Epub 2013 Sep 7.

Use of sibling pairs to determine the familial searching efficiency of forensic databases.

Forensic Sci Int Genet. 2008 Sep;2(4):340-2. doi: 10.1016/j.fsigen.2008.04.008. Epub 2008 Jun 12.

Familial searching on DNA mixtures with dropout.

Forensic Sci Int Genet. 2016 May;22:128-138. doi: 10.1016/j.fsigen.2016.02.002. Epub 2016 Feb 22.

Validation of SmartRank: A likelihood ratio software for searching national DNA databases with complex DNA profiles.

Forensic Sci Int Genet. 2017 Jul;29:145-153. doi: 10.1016/j.fsigen.2017.04.008. Epub 2017 Apr 15.

Evaluating DNA evidence in a genetically complex population.

Forensic Sci Int Genet. 2018 Sep;36:141-147. doi: 10.1016/j.fsigen.2018.06.019. Epub 2018 Jun 28.

Partial matches in heterogeneous offender databases do not call into question the validity of random match probability calculations.

Int J Legal Med. 2009 Jan;123(1):59-63. doi: 10.1007/s00414-008-0239-1. Epub 2008 May 6.

The successful use of familial searching in six Hungarian high profile cases by applying a new module in Familias 3.

Forensic Sci Int Genet. 2016 Sep;24:24-32. doi: 10.1016/j.fsigen.2016.05.012. Epub 2016 May 19.

Fitting the Balding-Nichols model to forensic databases.

Forensic Sci Int Genet. 2015 Nov;19:86-91. doi: 10.1016/j.fsigen.2015.05.005. Epub 2015 Jun 23.

Familial identification: population structure and relationship distinguishability.

PLoS Genet. 2012 Feb;8(2):e1002469. doi: 10.1371/journal.pgen.1002469. Epub 2012 Feb 9.

引用本文的文献

An Upper Bound on the Power of DNA to Distinguish Pedigree Relationships.

Genes (Basel). 2025 Apr 26;16(5):492. doi: 10.3390/genes16050492.

Characterizing the genetic structure of a forensic DNA database using a latent variable approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献