Suppr超能文献

FamLink2——一种用于涉及连锁DNA标记的系谱分析中似然性计算的综合工具,该分析考虑了基因型的不确定性。

FamLink2 - A comprehensive tool for likelihood computations in pedigrees analyses involving linked DNA markers accounting for genotype uncertainties.

作者信息

Kling Daniel, Mostad Petter, Tillmar Andreas

机构信息

Department of Forensic Sciences, Oslo University Hospital, Pb. 4950 Nydalen, Oslo NO-0424, Norway; Department of Forensic Genetics and Forensic Toxicology, National Board of Forensic Medicine, Linköping, Sweden; Biostatistics (BIAS), Norwegian University of Life Sciences, Aas, Norway.

Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg, Göteborg, Sweden.

出版信息

Forensic Sci Int Genet. 2025 Jan;74:103150. doi: 10.1016/j.fsigen.2024.103150. Epub 2024 Sep 24.

Abstract

There is an increasing demand for software that can handle an arbitrary number of linked markers in forensic genetics; primarily with application to inference of relationships and direct matching but also in applications such as ancestry inference and mixture interpretation. With the emergence of sequencing technologies, denser sets of SNP markers are generated and analyzed. Additionally, sequence data of low quality and quantity DNA generate uncertainty about the underlying true genotype. We provide an efficient implementation of a general model for pedigree likelihood computations with genetic marker data using a three-layered approach. The top and first layer is the population model where allele frequencies and population substructure are accounted for. The second layer is the inheritance model which efficiently handles linked markers using an IBD model. The third and bottom layer is the observational level where we model the likelihood of the true genotype given underlying reads as well as parameters for errors. We exemplify the utility of our implementation as well as provide validation according to guidelines established by the ISFG using a combination of two published SNP panels. We demonstrate that computations are feasible for panels encompassing 10,000 markers and we argue that, due to the properties of the underlying algorithm, extending the number of markers will result in a linear increase in computation time. In addition we study the impact of parameters used in our model and suggest some guidelines pertaining to their values. The results demonstrate that a probabilistic model for low coverage sequence read data is needed instead of relying on an a threshold based genotype and applying our general model for inference of relationships on a real case can be superior, i.e. higher information content, to other methods relying on either fixed genotypes with low quality sequence data or simple pair wise relationship tests. In summary, the implementation, FamLink2 (freely available at https://famlink.se), can jointly handle genetic linkage, genotype uncertainty and population substructure for an arbitrary pedigree with data for any number of individuals. Whereas the current study will focus on calculations disregarding mutations, FamLink2 has the ability to model mutations for certain built-in pedigrees.

摘要

对能够处理法医遗传学中任意数量连锁标记的软件需求日益增长;主要应用于亲缘关系推断和直接匹配,也应用于诸如祖先推断和混合样本解读等领域。随着测序技术的出现,产生并分析了密度更高的单核苷酸多态性(SNP)标记集。此外,低质量和低数量DNA的序列数据会导致潜在真实基因型的不确定性。我们使用三层方法,为基于遗传标记数据的家系似然计算提供了一个通用模型的高效实现。最上层和第一层是群体模型,其中考虑了等位基因频率和群体亚结构。第二层是遗传模型,它使用相同等位基因状态(IBD)模型有效处理连锁标记。第三层也是最底层是观测层,我们在此对给定潜在读数以及错误参数的真实基因型的似然性进行建模。我们举例说明了我们实现方法的实用性,并根据国际法医遗传学学会(ISFG)制定的指南,使用两个已发表的SNP面板组合进行了验证。我们证明了对于包含10,000个标记的面板,计算是可行的,并且我们认为,由于底层算法的特性,增加标记数量将导致计算时间呈线性增加。此外,我们研究了模型中使用的参数的影响,并提出了一些关于其值的指导原则。结果表明,需要一个针对低覆盖度序列读取数据的概率模型,而不是依赖基于阈值的基因型,并且在实际案例中应用我们的通用亲缘关系推断模型可能比其他依赖低质量序列数据的固定基因型或简单成对关系测试的方法更优越,即具有更高的信息含量。总之,我们实现的FamLink2(可在https://famlink.se免费获取)可以联合处理任意家系中任意数量个体的数据的遗传连锁、基因型不确定性和群体亚结构。虽然当前研究将专注于不考虑突变的计算,但FamLink2有能力对某些内置家系的突变进行建模。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验