Suppr超能文献

用于检测和校正基因型数据库中错误的分子和统计方法。

Molecular and statistical approaches to the detection and correction of errors in genotype databases.

作者信息

Brzustowicz L M, Mérette C, Xie X, Townsend L, Gilliam T C, Ott J

机构信息

Department of Psychiatry, Columbia University, College of Physicians and Surgeons, NY 10032.

出版信息

Am J Hum Genet. 1993 Nov;53(5):1137-45.

Abstract

Errors in genotyping data have been shown to have a significant effect on the estimation of recombination fractions in high-resolution genetic maps. Previous estimates of errors in existing databases have been limited to the analysis of relatively few markers and have suggested rates in the range 0.5%-1.5%. The present study capitalizes on the fact that within the Centre d'Etude du Polymorphisme Humain (CEPH) collection of reference families, 21 individuals are members of more than one family, with separate DNA samples provided by CEPH for each appearance of these individuals. By comparing the genotypes of these individuals in each of the families in which they occur, an estimated error rate of 1.4% was calculated for all loci in the version 4.0 CEPH database. Removing those individuals who were clearly identified by CEPH as appearing in more than one family resulted in a 3.0% error rate for the remaining samples, suggesting that some error checking of the identified repeated individuals may occur prior to data submission. An error rate of 3.0% for version 4.0 data was also obtained for four chromosome 5 markers that were retyped through the entire CEPH collection. The effects of these errors on a multipoint map were significant, with a total sex-averaged length of 36.09 cM with the errors, and 19.47 cM with the errors corrected. Several statistical approaches to detect and allow for errors during linkage analysis are presented. One method, which identified families containing possible errors on the basis of the impact on the maximum lod score, showed particular promise, especially when combined with the limited retyping of the identified families. The impact of the demonstrated error rate in an established genotype database on high-resolution mapping is significant, raising the question of the overall value of incorporating such existing data into new genetic maps.

摘要

基因分型数据中的错误已被证明对高分辨率遗传图谱中重组率的估计有显著影响。先前对现有数据库中错误的估计仅限于对相对较少标记的分析,且错误率在0.5% - 1.5%之间。本研究利用了这样一个事实,即在人类多态性研究中心(CEPH)的参考家系集合中,有21个人属于不止一个家系,CEPH为这些人每次出现都提供了单独的DNA样本。通过比较这些个体在其所在的每个家系中的基因型,计算出CEPH数据库4.0版本中所有位点的估计错误率为1.4%。去除那些被CEPH明确认定出现在不止一个家系中的个体后,其余样本的错误率为3.0%,这表明在数据提交之前可能对已识别的重复个体进行了一些错误检查。通过对整个CEPH集合重新分型的5号染色体上的四个标记,也得到了4.0版本数据3.0%的错误率。这些错误对多点图谱的影响是显著的,存在错误时性平均总长度为36.09厘摩,错误校正后为19.47厘摩。本文介绍了几种在连锁分析过程中检测和处理错误的统计方法。其中一种方法,即根据对最大似然比分数的影响来识别可能存在错误的家系,显示出特别的前景,尤其是与对已识别家系进行有限的重新分型相结合时。在一个既定的基因型数据库中所证明的错误率对高分辨率图谱的影响是显著 的,这就引发了将此类现有数据纳入新的遗传图谱的整体价值的问题。

相似文献

3
5
CEPH consortium map of chromosome 14.14号染色体的CEPH联合体图谱。
Cytogenet Cell Genet. 1995;69(3-4):175-8. doi: 10.1159/000133955.
6
7
Integrated genetic map of human chromosome 2.人类2号染色体的综合遗传图谱。
Ann Hum Genet. 1995 Oct;59(4):413-34. doi: 10.1111/j.1469-1809.1995.tb00760.x.
8
CEPH consortium Map of chromosome 9.CEPH协作组9号染色体图谱。
Genomics. 1994 Jan 15;19(2):203-14. doi: 10.1006/geno.1994.1049.

引用本文的文献

3
Genetic linkage analysis in the age of whole-genome sequencing.全基因组测序时代的基因连锁分析
Nat Rev Genet. 2015 May;16(5):275-84. doi: 10.1038/nrg3908. Epub 2015 Mar 31.

本文引用的文献

1
Strategies for multilocus linkage analysis in humans.人类多位点连锁分析策略。
Proc Natl Acad Sci U S A. 1984 Jun;81(11):3443-6. doi: 10.1073/pnas.81.11.3443.
4
Construction of multilocus genetic linkage maps in humans.人类多位点遗传连锁图谱的构建。
Proc Natl Acad Sci U S A. 1987 Apr;84(8):2363-7. doi: 10.1073/pnas.84.8.2363.
6
Report of the committee on linkage and gene order.连锁与基因顺序委员会报告
Cytogenet Cell Genet. 1990;55(1-4):387-94. doi: 10.1159/000133023.
7

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验