Suppr超能文献

遗留数据使基因组学研究变得复杂。

Legacy Data Confound Genomics Studies.

机构信息

Department of Human Genetics, McGill University, Montreal, QC, Canada.

McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada.

出版信息

Mol Biol Evol. 2020 Jan 1;37(1):2-10. doi: 10.1093/molbev/msz201.

Abstract

Recent reports have identified differences in the mutational spectra across human populations. Although some of these reports have been replicated in other cohorts, most have been reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative population stratification within the Japanese population, we identified a previously unreported batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population stratification. Because the 1kGP data are used extensively, we find that the batch effects also lead to incorrect imputation by leading imputation servers and a small number of suspicious GWAS associations. Lower quality data from the early phases of the 1kGP thus continue to contaminate modern studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data.

摘要

最近的报告已经确定了不同人群的突变谱存在差异。尽管其中一些报告在其他队列中得到了复制,但大多数报告仅在 1000 基因组计划(1kGP)数据中得到了报道。在研究日本人群中一个有趣的假定群体分层时,我们发现了一个以前未报告的批次效应,导致 1kGP 数据中出现虚假突变调用,并导致明显的群体分层。由于 1kGP 数据被广泛使用,我们发现批次效应也会导致领先的 imputation 服务器和少量可疑 GWAS 关联的错误 imputation。因此,1kGP 早期阶段的低质量数据仍然以隐藏的方式污染现代研究。也许是时候废弃或升级这些遗留测序数据了。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验