Suppr超能文献

使用整数线性规划进行最大似然家系重建。

Maximum likelihood pedigree reconstruction using integer linear programming.

机构信息

Department of Computer Science, University of York, York, North Yorkshire, United Kingdom.

出版信息

Genet Epidemiol. 2013 Jan;37(1):69-83. doi: 10.1002/gepi.21686. Epub 2012 Oct 3.

Abstract

Large population biobanks of unrelated individuals have been highly successful in detecting common genetic variants affecting diseases of public health concern. However, they lack the statistical power to detect more modest gene-gene and gene-environment interaction effects or the effects of rare variants for which related individuals are ideally required. In reality, most large population studies will undoubtedly contain sets of undeclared relatives, or pedigrees. Although a crude measure of relatedness might sometimes suffice, having a good estimate of the true pedigree would be much more informative if this could be obtained efficiently. Relatives are more likely to share longer haplotypes around disease susceptibility loci and are hence biologically more informative for rare variants than unrelated cases and controls. Distant relatives are arguably more useful for detecting variants with small effects because they are less likely to share masking environmental effects. Moreover, the identification of relatives enables appropriate adjustments of statistical analyses that typically assume unrelatedness. We propose to exploit an integer linear programming optimisation approach to pedigree learning, which is adapted to find valid pedigrees by imposing appropriate constraints. Our method is not restricted to small pedigrees and is guaranteed to return a maximum likelihood pedigree. With additional constraints, we can also search for multiple high-probability pedigrees and thus account for the inherent uncertainty in any particular pedigree reconstruction. The true pedigree is found very quickly by comparison with other methods when all individuals are observed. Extensions to more complex problems seem feasible.

摘要

大型无关个体人群生物库在检测影响公众健康关注的疾病的常见遗传变异方面非常成功。然而,它们缺乏检测适度基因-基因和基因-环境相互作用效应或稀有变异效应的统计能力,而相关个体是检测这些效应的理想选择。实际上,大多数大型人群研究无疑会包含一系列未申报的亲属或家系。虽然有时粗略的亲缘关系测量可能就足够了,但如果能够有效地获得,则对真实家系进行良好估计将更具信息量。亲属在疾病易感基因座周围更有可能共享更长的单倍型,因此对于稀有变异,他们比无关的病例和对照更具生物学信息。由于遥远的亲属不太可能共享掩蔽环境效应,因此对于检测小效应的变体,他们可能更有用。此外,识别亲属可以对统计分析进行适当调整,这些分析通常假设不存在亲缘关系。我们建议利用整数线性规划优化方法进行系谱学习,该方法通过施加适当的约束来找到有效的系谱。我们的方法不仅限于小系谱,并且保证返回最大似然系谱。通过附加约束,我们还可以搜索多个高概率系谱,从而考虑到任何特定系谱重建中的固有不确定性。当所有个体都被观察到时,与其他方法相比,通过比较可以快速找到真实的系谱。扩展到更复杂的问题似乎是可行的。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验