用于检测和校正基因型数据库中错误的分子和统计方法。

Molecular and statistical approaches to the detection and correction of errors in genotype databases.

作者信息

Brzustowicz L M, Mérette C, Xie X, Townsend L, Gilliam T C, Ott J

机构信息

Department of Psychiatry, Columbia University, College of Physicians and Surgeons, NY 10032.

出版信息

Am J Hum Genet. 1993 Nov;53(5):1137-45.

PMID:8213837

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1682304/

Abstract

Errors in genotyping data have been shown to have a significant effect on the estimation of recombination fractions in high-resolution genetic maps. Previous estimates of errors in existing databases have been limited to the analysis of relatively few markers and have suggested rates in the range 0.5%-1.5%. The present study capitalizes on the fact that within the Centre d'Etude du Polymorphisme Humain (CEPH) collection of reference families, 21 individuals are members of more than one family, with separate DNA samples provided by CEPH for each appearance of these individuals. By comparing the genotypes of these individuals in each of the families in which they occur, an estimated error rate of 1.4% was calculated for all loci in the version 4.0 CEPH database. Removing those individuals who were clearly identified by CEPH as appearing in more than one family resulted in a 3.0% error rate for the remaining samples, suggesting that some error checking of the identified repeated individuals may occur prior to data submission. An error rate of 3.0% for version 4.0 data was also obtained for four chromosome 5 markers that were retyped through the entire CEPH collection. The effects of these errors on a multipoint map were significant, with a total sex-averaged length of 36.09 cM with the errors, and 19.47 cM with the errors corrected. Several statistical approaches to detect and allow for errors during linkage analysis are presented. One method, which identified families containing possible errors on the basis of the impact on the maximum lod score, showed particular promise, especially when combined with the limited retyping of the identified families. The impact of the demonstrated error rate in an established genotype database on high-resolution mapping is significant, raising the question of the overall value of incorporating such existing data into new genetic maps.

摘要

基因分型数据中的错误已被证明对高分辨率遗传图谱中重组率的估计有显著影响。先前对现有数据库中错误的估计仅限于对相对较少标记的分析，且错误率在0.5% - 1.5%之间。本研究利用了这样一个事实，即在人类多态性研究中心（CEPH）的参考家系集合中，有21个人属于不止一个家系，CEPH为这些人每次出现都提供了单独的DNA样本。通过比较这些个体在其所在的每个家系中的基因型，计算出CEPH数据库4.0版本中所有位点的估计错误率为1.4%。去除那些被CEPH明确认定出现在不止一个家系中的个体后，其余样本的错误率为3.0%，这表明在数据提交之前可能对已识别的重复个体进行了一些错误检查。通过对整个CEPH集合重新分型的5号染色体上的四个标记，也得到了4.0版本数据3.0%的错误率。这些错误对多点图谱的影响是显著的，存在错误时性平均总长度为36.09厘摩，错误校正后为19.47厘摩。本文介绍了几种在连锁分析过程中检测和处理错误的统计方法。其中一种方法，即根据对最大似然比分数的影响来识别可能存在错误的家系，显示出特别的前景，尤其是与对已识别家系进行有限的重新分型相结合时。在一个既定的基因型数据库中所证明的错误率对高分辨率图谱的影响是显著的，这就引发了将此类现有数据纳入新的遗传图谱的整体价值的问题。

相似文献

Molecular and statistical approaches to the detection and correction of errors in genotype databases.用于检测和校正基因型数据库中错误的分子和统计方法。

Am J Hum Genet. 1993 Nov;53(5):1137-45.

Physical and genetic map of 5q31: use of fluorescence in situ hybridization data to identify errors in the CEPH database. Centre d'Etude de Polymorphisme Humain.5q31的物理和遗传图谱：利用荧光原位杂交数据识别CEPH数据库中的错误。人类多态性研究中心。

Cytogenet Cell Genet. 1994;67(2):86-93. doi: 10.1159/000133805.

The CEPH consortium linkage map of human chromosome 1.人类1号染色体的CEPH联盟连锁图谱。

Genomics. 1991 Apr;9(4):686-700. doi: 10.1016/0888-7543(91)90362-i.

The CEPH consortium linkage map of human chromosome 16.人类16号染色体的CEPH联合体连锁图谱。

Genomics. 1995 Jan 1;25(1):44-58. doi: 10.1016/0888-7543(95)80108-x.

CEPH consortium map of chromosome 14.14号染色体的CEPH联合体图谱。

Cytogenet Cell Genet. 1995;69(3-4):175-8. doi: 10.1159/000133955.

The CEPH consortium linkage map of human chromosome 2.人类2号染色体的CEPH联合体连锁图谱。

Genomics. 1992 Dec;14(4):1055-63. doi: 10.1016/s0888-7543(05)80129-6.

Integrated genetic map of human chromosome 2.人类2号染色体的综合遗传图谱。

Ann Hum Genet. 1995 Oct;59(4):413-34. doi: 10.1111/j.1469-1809.1995.tb00760.x.

CEPH consortium Map of chromosome 9.CEPH协作组9号染色体图谱。

Genomics. 1994 Jan 15;19(2):203-14. doi: 10.1006/geno.1994.1049.

A genetic linkage map of human chromosome 5 with 60 RFLP loci.一张含有60个限制性片段长度多态性（RFLP）位点的人类5号染色体遗传连锁图谱。

Genomics. 1991 May;10(1):173-85. doi: 10.1016/0888-7543(91)90498-4.

A genetic linkage map of 32 loci on human chromosome 10.人类10号染色体上32个基因座的遗传连锁图谱。

Genomics. 1989 Nov;5(4):718-26. doi: 10.1016/0888-7543(89)90113-4.

引用本文的文献

Quantitative trait loci analysis for molecular markers linked to agricultural traits of Pleurotus ostreatus.数量性状位点分析与糙皮侧耳农艺性状相关的分子标记。

PLoS One. 2024 Aug 12;19(8):e0308832. doi: 10.1371/journal.pone.0308832. eCollection 2024.

Rules for resolving Mendelian inconsistencies in nuclear pedigrees typed for two-allele markers.解决双等位基因标记分型的核系谱中孟德尔不一致性的规则。

PLoS One. 2017 Mar 2;12(3):e0172807. doi: 10.1371/journal.pone.0172807. eCollection 2017.

Genetic linkage analysis in the age of whole-genome sequencing.全基因组测序时代的基因连锁分析

Nat Rev Genet. 2015 May;16(5):275-84. doi: 10.1038/nrg3908. Epub 2015 Mar 31.

William Allan Award Address: On the role and soul of a statistical geneticist.威廉·艾伦奖演讲：论统计遗传学家的角色与灵魂

Am J Hum Genet. 2011 Mar 11;88(3):264-8. doi: 10.1016/j.ajhg.2011.02.013.

Construction of a consensus linkage map for red clover (Trifolium pratense L.).红三叶草（Trifolium pratense L.）共识连锁图谱的构建。

BMC Plant Biol. 2009 May 14;9:57. doi: 10.1186/1471-2229-9-57.

Deviations from hardy-weinberg equilibrium in parental and unaffected sibling genotype data.亲本及未患病同胞基因型数据偏离哈迪-温伯格平衡。

Hum Hered. 2009;67(2):104-15. doi: 10.1159/000179558. Epub 2008 Dec 12.

A third-generation microsatellite-based linkage map of the honey bee, Apis mellifera, and its comparison with the sequence-based physical map.基于微卫星的蜜蜂（西方蜜蜂）第三代连锁图谱及其与基于序列的物理图谱的比较。

Genome Biol. 2007;8(4):R66. doi: 10.1186/gb-2007-8-4-r66.

Genomewide high-density SNP linkage analysis of 236 Japanese families supports the existence of schizophrenia susceptibility loci on chromosomes 1p, 14q, and 20p.对236个日本家庭进行的全基因组高密度单核苷酸多态性（SNP）连锁分析支持在1号染色体短臂、14号染色体长臂和20号染色体短臂上存在精神分裂症易感基因座。

Am J Hum Genet. 2005 Dec;77(6):937-44. doi: 10.1086/498122. Epub 2005 Oct 12.

Detection of genotyping errors and pseudo-SNPs via deviations from Hardy-Weinberg equilibrium.通过偏离哈迪-温伯格平衡检测基因分型错误和假单核苷酸多态性

Genet Epidemiol. 2005 Nov;29(3):204-14. doi: 10.1002/gepi.20086.

A high-density screen for linkage in multiple sclerosis.一项针对多发性硬化症连锁反应的高密度筛查。

Am J Hum Genet. 2005 Sep;77(3):454-67. doi: 10.1086/444547. Epub 2005 Jul 29.

本文引用的文献

Strategies for multilocus linkage analysis in humans.人类多位点连锁分析策略。

Proc Natl Acad Sci U S A. 1984 Jun;81(11):3443-6. doi: 10.1073/pnas.81.11.3443.

Evaluating pedigree data. I. The estimation of pedigree error in the presence of marker mistyping.评估系谱数据。I. 存在标记误分型时系谱误差的估计。

Am J Hum Genet. 1983 Mar;35(2):241-62.

Use of cyclosporin A in establishing Epstein-Barr virus-transformed human lymphoblastoid cell lines.环孢菌素A在建立爱泼斯坦-巴尔病毒转化的人淋巴母细胞系中的应用。

In Vitro. 1984 Nov;20(11):856-8. doi: 10.1007/BF02619631.

Construction of multilocus genetic linkage maps in humans.人类多位点遗传连锁图谱的构建。

Proc Natl Acad Sci U S A. 1987 Apr;84(8):2363-7. doi: 10.1073/pnas.84.8.2363.

Localization of the Huntington's disease gene to a small segment of chromosome 4 flanked by D4S10 and the telomere.亨廷顿舞蹈症基因定位于4号染色体上一段由D4S10和端粒界定的小区域。

Cell. 1987 Aug 14;50(4):565-71. doi: 10.1016/0092-8674(87)90029-8.

Report of the committee on linkage and gene order.连锁与基因顺序委员会报告

Cytogenet Cell Genet. 1990;55(1-4):387-94. doi: 10.1159/000133023.

The CEPH consortium linkage map of human chromosome 1.人类1号染色体的CEPH联盟连锁图谱。

Genomics. 1991 Apr;9(4):686-700. doi: 10.1016/0888-7543(91)90362-i.

Influence of aberrant observations on high-resolution linkage analysis outcomes.异常观测值对高分辨率连锁分析结果的影响。

Am J Hum Genet. 1991 Nov;49(5):985-94.

Linkage disequilibrium between two highly polymorphic microsatellites.两个高度多态性微卫星之间的连锁不平衡。

Am J Hum Genet. 1991 Nov;49(5):966-71.

Mapping of human microtubule-associated protein 1B in proximity to the spinal muscular atrophy locus at 5q13.人类微管相关蛋白1B在5q13脊髓性肌萎缩症基因座附近的定位。

Proc Natl Acad Sci U S A. 1991 Sep 1;88(17):7873-6. doi: 10.1073/pnas.88.17.7873.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验