• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

量化基因型调用中的不确定性。

Quantifying uncertainty in genotype calls.

机构信息

Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA.

出版信息

Bioinformatics. 2010 Jan 15;26(2):242-9. doi: 10.1093/bioinformatics/btp624. Epub 2009 Nov 11.

DOI:10.1093/bioinformatics/btp624
PMID:19906825
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2804295/
Abstract

MOTIVATION

Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Microarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs) simultaneously. The starting point for the statistical analyses used by GWAS to determine association between loci and disease is making genotype calls (AA, AB or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays and different sample batches have substantial influence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability can adversely affect the quality of findings reported by the GWAS.

RESULTS

We developed a method based on an enhanced version of the multi-level model used by CRLMM version 1. Two key differences are that we now account for variability across batches and improve the call-specific assessment of each call. The new model permits the development of quality metrics for SNPs, samples and batches of samples. Using three independent datasets, we demonstrate that the CRLMM version 2 outperforms CRLMM version 1 and the algorithm provided by Affymetrix, Birdseed. The main advantage of the new approach is that it enables the identification of low-quality SNPs, samples and batches.

AVAILABILITY

Software implementing of the method described in this article is available as free and open source code in the crlmm R/BioConductor package.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组关联研究(GWAS)用于发现复杂的、可遗传的疾病的相关基因,过去这些疾病的研究设计不太强大。最近,GWAS 的数量激增,顶级期刊和主流媒体都有报道。微阵列是 GWAS 中首选的基因型检测技术,因为它们可以同时探索超过一百万的单核苷酸多态性(SNP)。GWAS 用于确定基因座与疾病之间关联的统计分析的起点是进行基因型调用(AA、AB 或 BB)。然而,在得出这些调用之前,原始数据(微阵列探针强度)需要经过大量处理。已经提出了各种复杂的统计程序来将原始数据转换为基因型调用。我们发现,不同 SNP、不同微阵列和不同样本批次之间的微阵列输出质量的可变性对现有算法做出的基因型调用的准确性有很大影响。如果不考虑这些可变性来源,可能会对 GWAS 报告的发现质量产生不利影响。

结果

我们开发了一种基于 CRLMM 版本 1 中使用的增强多级模型的方法。两个关键区别是,我们现在考虑了批次之间的可变性,并改进了每个调用的特定调用评估。新模型允许为 SNP、样本和样本批次开发质量指标。使用三个独立的数据集,我们证明 CRLMM 版本 2 优于 CRLMM 版本 1 和 Affymetrix 的 Birdseed 算法。新方法的主要优点是它能够识别低质量的 SNP、样本和批次。

可用性

本文中描述的方法的软件实现在 crlmm R/BioConductor 包中作为免费的开源代码提供。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
Quantifying uncertainty in genotype calls.量化基因型调用中的不确定性。
Bioinformatics. 2010 Jan 15;26(2):242-9. doi: 10.1093/bioinformatics/btp624. Epub 2009 Nov 11.
2
Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples.使用270个HapMap样本评估基因分型算法BRLMM对Affymetrix GeneChip Human Mapping 500 K芯片组的批次效应。
BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S17. doi: 10.1186/1471-2105-9-S9-S17.
3
A multi-array multi-SNP genotyping algorithm for Affymetrix SNP microarrays.一种用于Affymetrix SNP微阵列的多阵列多SNP基因分型算法。
Bioinformatics. 2007 Jun 15;23(12):1459-67. doi: 10.1093/bioinformatics/btm131. Epub 2007 Apr 25.
4
Evaluating the influence of quality control decisions and software algorithms on SNP calling for the affymetrix 6.0 SNP array platform.评估质量控制决策和软件算法对Affymetrix 6.0 SNP芯片平台SNP分型的影响。
Hum Hered. 2011;71(4):221-33. doi: 10.1159/000328843. Epub 2011 Jul 2.
5
R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.Illumina Infinium 全基因组基因分型 BeadChips 的 R/Bioconductor 软件。
Bioinformatics. 2009 Oct 1;25(19):2621-3. doi: 10.1093/bioinformatics/btp470. Epub 2009 Aug 6.
6
Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies.GWAS 分析中的变异性:基因型调用算法不一致的影响。
Pharmacogenomics J. 2010 Aug;10(4):324-35. doi: 10.1038/tpj.2010.46.
7
SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays.SNiPer-HD:通过用于高密度单核苷酸多态性(SNP)阵列的期望最大化算法提高基因型分型准确性。
Bioinformatics. 2007 Jan 1;23(1):57-63. doi: 10.1093/bioinformatics/btl536. Epub 2006 Oct 24.
8
Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform.用于评估 Affymetrix 6.0 SNP 阵列平台的基因组拷贝数变异的软件比较。
BMC Bioinformatics. 2011 May 31;12:220. doi: 10.1186/1471-2105-12-220.
9
Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips.比较 Illumina 的 Infinium 全基因组 SNP BeadChips 基因分型算法。
BMC Bioinformatics. 2011 Mar 8;12:68. doi: 10.1186/1471-2105-12-68.
10
M(3): an improved SNP calling algorithm for Illumina BeadArray data.M(3):一种用于 Illumina BeadArray 数据的 SNP 调用算法的改进。
Bioinformatics. 2012 Feb 1;28(3):358-65. doi: 10.1093/bioinformatics/btr673. Epub 2011 Dec 8.

引用本文的文献

1
Genotype prediction of 336,463 samples from public expression data.基于公开表达数据对336,463个样本进行基因型预测。
bioRxiv. 2024 Mar 13:2023.10.21.562237. doi: 10.1101/2023.10.21.562237.
2
Gene essentiality in cancer cell lines is modified by the sex chromosomes.性染色体修饰癌细胞系中的基因必需性。
Genome Res. 2022 Nov-Dec;32(11-12):1993-2002. doi: 10.1101/gr.276488.121. Epub 2022 Nov 23.
3
Analysis of the caudate nucleus transcriptome in individuals with schizophrenia highlights effects of antipsychotics and new risk genes.分析精神分裂症个体的尾状核转录组,突出了抗精神病药物的作用和新的风险基因。
Nat Neurosci. 2022 Nov;25(11):1559-1568. doi: 10.1038/s41593-022-01182-7. Epub 2022 Nov 1.
4
Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant.PIK3CA 突变的等位基因表达失衡在乳腺癌中很常见,且具有预后意义。
NPJ Breast Cancer. 2022 Jun 8;8(1):71. doi: 10.1038/s41523-022-00435-9.
5
Heritability and Genomic Architecture of Episodic Exercise-Induced Collapse in Border Collies.遗传性和边境牧羊犬运动性癫痫发作的基因组结构。
Genes (Basel). 2021 Nov 29;12(12):1927. doi: 10.3390/genes12121927.
6
Role of a genetic variation in the microRNA-4421 binding site of ERP29 regarding risk of oropharynx cancer and prognosis.ERP29 微 RNA-4421 结合位点的遗传变异与口咽癌风险和预后的关系。
Sci Rep. 2020 Oct 12;10(1):17039. doi: 10.1038/s41598-020-73675-z.
7
Inherited variations in human pigmentation-related genes modulate cutaneous melanoma risk and clinicopathological features in Brazilian population.人类色素沉着相关基因的遗传变异可调节巴西人群皮肤黑色素瘤的风险和临床病理特征。
Sci Rep. 2020 Jul 22;10(1):12129. doi: 10.1038/s41598-020-68945-9.
8
Combination of PI3K and MEK inhibitors yields durable remission in PDX models of PIK3CA-mutated metaplastic breast cancers.PI3K 和 MEK 抑制剂联合治疗可使 PIK3CA 突变性乳腺肉瘤样癌 PDX 模型获得持久缓解。
J Hematol Oncol. 2020 Feb 22;13(1):13. doi: 10.1186/s13045-020-0846-y.
9
Response to mTOR and PI3K inhibitors in enzalutamide-resistant luminal androgen receptor triple-negative breast cancer patient-derived xenografts.在恩扎卢胺耐药的腔面雄激素受体三阴性乳腺癌患者来源异种移植模型中对 mTOR 和 PI3K 抑制剂的反应。
Theranostics. 2020 Jan 1;10(4):1531-1543. doi: 10.7150/thno.36182. eCollection 2020.
10
Genome-epigenome interactions associated with Myalgic Encephalomyelitis/Chronic Fatigue Syndrome.与肌痛性脑脊髓炎/慢性疲劳综合征相关的基因组-表观基因组相互作用。
Epigenetics. 2018;13(12):1174-1190. doi: 10.1080/15592294.2018.1549769. Epub 2018 Dec 5.

本文引用的文献

1
R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.Illumina Infinium 全基因组基因分型 BeadChips 的 R/Bioconductor 软件。
Bioinformatics. 2009 Oct 1;25(19):2621-3. doi: 10.1093/bioinformatics/btp470. Epub 2009 Aug 6.
2
Inflammation, hemostasis, and the risk of kidney function decline in the Atherosclerosis Risk in Communities (ARIC) Study.社区动脉粥样硬化风险(ARIC)研究中的炎症、止血与肾功能下降风险
Am J Kidney Dis. 2009 Apr;53(4):596-605. doi: 10.1053/j.ajkd.2008.10.044. Epub 2008 Dec 24.
3
Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.单核苷酸多态性(SNPs)、常见拷贝数多态性和罕见拷贝数变异(CNVs)的整合基因型分型与关联分析。
Nat Genet. 2008 Oct;40(10):1253-60. doi: 10.1038/ng.237. Epub 2008 Sep 7.
4
Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays.验证和扩展基于 Affymetrix 微阵列的 SNP 调用的经验贝叶斯方法。
Genome Biol. 2008 Apr 3;9(4):R63. doi: 10.1186/gb-2008-9-4-r63.
5
A new multipoint method for genome-wide association studies by imputation of genotypes.一种通过基因型插补进行全基因组关联研究的新的多点方法。
Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.
6
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.对14000例七种常见疾病患者及3000例共享对照进行全基因组关联研究。
Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.
7
A method to address differential bias in genotyping in large-scale association studies.一种解决大规模关联研究中基因分型差异偏倚的方法。
PLoS Genet. 2007 May 18;3(5):e74. doi: 10.1371/journal.pgen.0030074. Epub 2007 Apr 5.
8
Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data.高密度寡核苷酸单核苷酸多态性(SNP)阵列数据的探索、标准化及基因型分型
Biostatistics. 2007 Apr;8(2):485-99. doi: 10.1093/biostatistics/kxl042. Epub 2006 Dec 22.
9
Genetics of Kidneys in Diabetes (GoKinD) study: a genetics collection available for identifying genetic susceptibility factors for diabetic nephropathy in type 1 diabetes.糖尿病肾脏遗传学(GoKinD)研究:一个可用于识别1型糖尿病中糖尿病肾病遗传易感性因素的遗传学数据集。
J Am Soc Nephrol. 2006 Jul;17(7):1782-90. doi: 10.1681/ASN.2005080822. Epub 2006 Jun 14.
10
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。
Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.