MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol, UK.
Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, UK.
Int J Epidemiol. 2023 Apr 12;52(5):1498-1521. doi: 10.1093/ije/dyad018. eCollection 2023 Oct.
Mendelian randomization (MR) studies are susceptible to metadata errors (e.g. incorrect specification of the effect allele column) and other analytical issues that can introduce substantial bias into analyses. We developed a quality control (QC) pipeline for the Fatty Acids in Cancer Mendelian Randomization Collaboration (FAMRC) that can be used to identify and correct for such errors.
We collated summary association statistics from fatty acid and cancer genome-wide association studies (GWAS) and subjected the collated data to a comprehensive QC pipeline. We identified metadata errors through comparison of study-specific statistics to external reference data sets (the National Human Genome Research Institute-European Bioinformatics Institute GWAS catalogue and 1000 genome super populations) and other analytical issues through comparison of reported to expected genetic effect sizes. Comparisons were based on three sets of genetic variants: (i) GWAS hits for fatty acids, (ii) GWAS hits for cancer and (iii) a 1000 genomes reference set.
We collated summary data from 6 fatty acid and 54 cancer GWAS. Metadata errors and analytical issues with the potential to introduce substantial bias were identified in seven studies (11.6%). After resolving metadata errors and analytical issues, we created a data set of 219 842 genetic associations with 90 cancer types, generated in analyses of 566 665 cancer cases and 1 622 374 controls.
In this large MR collaboration, 11.6% of included studies were affected by a substantial metadata error or analytical issue. By increasing the integrity of collated summary data prior to their analysis, our protocol can be used to increase the reliability of downstream MR analyses. Our pipeline is available to other researchers via the CheckSumStats package (https://github.com/MRCIEU/CheckSumStats).
孟德尔随机化(MR)研究易受元数据错误(例如,错误指定效应等位基因列)和其他分析问题的影响,这些问题可能会给分析带来重大偏差。我们为癌症脂肪酸孟德尔随机化协作组(FAMRC)开发了一个质量控制(QC)管道,可以用于识别和纠正此类错误。
我们整理了脂肪酸和癌症全基因组关联研究(GWAS)的汇总关联统计数据,并对整理后的数据进行了全面的 QC 管道处理。我们通过比较特定于研究的统计数据与外部参考数据集(国立人类基因组研究所-欧洲生物信息学研究所 GWAS 目录和 1000 基因组超级人群)以及通过比较报告的遗传效应大小与预期遗传效应大小,识别元数据错误和其他分析问题。比较基于三组遗传变体:(i)脂肪酸的 GWAS 命中,(ii)癌症的 GWAS 命中和(iii)1000 基因组参考集。
我们整理了 6 种脂肪酸和 54 种癌症 GWAS 的汇总数据。在 7 项研究(11.6%)中发现了具有潜在重大偏差的元数据错误和分析问题。在解决元数据错误和分析问题后,我们创建了一个包含 90 种癌症类型的 219842 个遗传关联数据集,这些数据是在对 566665 例癌症病例和 1622374 例对照进行分析时生成的。
在这项大型 MR 合作中,11.6%的纳入研究受到重大元数据错误或分析问题的影响。通过在分析前提高整理汇总数据的完整性,我们的方案可以用于提高下游 MR 分析的可靠性。我们的管道可通过 CheckSumStats 包(https://github.com/MRCIEU/CheckSumStats)提供给其他研究人员使用。