Suppr超能文献

一项分布式全基因组测序基准研究。

A Distributed Whole Genome Sequencing Benchmark Study.

作者信息

Corbett Richard D, Eveleigh Robert, Whitney Joe, Barai Namrata, Bourgey Mathieu, Chuah Eric, Johnson Joanne, Moore Richard A, Moradin Neda, Mungall Karen L, Pereira Sergio, Reuter Miriam S, Thiruvahindrapuram Bhooma, Wintle Richard F, Ragoussis Jiannis, Strug Lisa J, Herbrick Jo-Anne, Aziz Naveed, Jones Steven J M, Lathrop Mark, Scherer Stephen W, Staffa Alfredo, Mungall Andrew J

机构信息

Canada's Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada.

McGill Genome Centre, McGill University, Montreal, QC, Canada.

出版信息

Front Genet. 2020 Dec 1;11:612515. doi: 10.3389/fgene.2020.612515. eCollection 2020.

Abstract

Population sequencing often requires collaboration across a distributed network of sequencing centers for the timely processing of thousands of samples. In such massive efforts, it is important that participating scientists can be confident that the accuracy of the sequence data produced is not affected by which center generates the data. A study was conducted across three established sequencing centers, located in Montreal, Toronto, and Vancouver, constituting Canada's Genomics Enterprise (www.cgen.ca). Whole genome sequencing was performed at each center, on three genomic DNA replicates from three well-characterized cell lines. Secondary analysis pipelines employed by each site were applied to sequence data from each of the sites, resulting in three datasets for each of four variables (cell line, replicate, sequencing center, and analysis pipeline), for a total of 81 datasets. These datasets were each assessed according to multiple quality metrics including concordance with benchmark variant truth sets to assess consistent quality across all three conditions for each variable. Three-way concordance analysis of variants across conditions for each variable was performed. Our results showed that the variant concordance between datasets differing only by sequencing center was similar to the concordance for datasets differing only by replicate, using the same analysis pipeline. We also showed that the statistically significant differences between datasets result from the analysis pipeline used, which can be unified and updated as new approaches become available. We conclude that genome sequencing projects can rely on the quality and reproducibility of aggregate data generated across a network of distributed sites.

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d23e/7736078/16a076ce3792/fgene-11-612515-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验