Suppr超能文献

遗留数据使基因组学研究变得复杂。

Legacy Data Confound Genomics Studies.

机构信息

Department of Human Genetics, McGill University, Montreal, QC, Canada.

McGill University and Genome Quebec Innovation Centre, Montreal, QC, Canada.

出版信息

Mol Biol Evol. 2020 Jan 1;37(1):2-10. doi: 10.1093/molbev/msz201.

Abstract

Recent reports have identified differences in the mutational spectra across human populations. Although some of these reports have been replicated in other cohorts, most have been reported only in the 1000 Genomes Project (1kGP) data. While investigating an intriguing putative population stratification within the Japanese population, we identified a previously unreported batch effect leading to spurious mutation calls in the 1kGP data and to the apparent population stratification. Because the 1kGP data are used extensively, we find that the batch effects also lead to incorrect imputation by leading imputation servers and a small number of suspicious GWAS associations. Lower quality data from the early phases of the 1kGP thus continue to contaminate modern studies in hidden ways. It may be time to retire or upgrade such legacy sequencing data.

摘要

最近的报告已经确定了不同人群的突变谱存在差异。尽管其中一些报告在其他队列中得到了复制,但大多数报告仅在 1000 基因组计划(1kGP)数据中得到了报道。在研究日本人群中一个有趣的假定群体分层时,我们发现了一个以前未报告的批次效应,导致 1kGP 数据中出现虚假突变调用,并导致明显的群体分层。由于 1kGP 数据被广泛使用,我们发现批次效应也会导致领先的 imputation 服务器和少量可疑 GWAS 关联的错误 imputation。因此,1kGP 早期阶段的低质量数据仍然以隐藏的方式污染现代研究。也许是时候废弃或升级这些遗留测序数据了。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验