Daniels Camille A, Abdulkadir Adetola, Cleveland Megan H, McDaniel Jennifer H, Jáspez David, Rubio-Rodríguez Luis Alberto, Muñoz-Barrera Adrián, Lorenzo-Salazar José Miguel, Flores Carlos, Yoo Byunggil, Sahraeian Sayed Mohammad Ebrahim, Wang Yina, Rossi Massimiliano, Visvanath Arun, Murray Lisa, Chen Wei-Ting, Catreux Severine, Han James, Mehio Rami, Parnaby Gavin, Carroll Andrew, Chang Pi-Chuan, Shafin Kishwar, Cook Daniel, Kolesnikov Alexey, Brambrink Lucas, Mootor Mohammed Faizal Eeman, Patel Yash, Yamaguchi Takafumi N, Boutros Paul C, Sienkiewicz Karolina, Foox Jonathan, Mason Christopher E, Lajoie Bryan R, Ruiz-Perez Carlos A, Kruglyak Semyon, Zook Justin M, Olson Nathan D
Medical Device Innovation Consortium (MDIC), 1655 N Ft. Myer Drive, 12th Floor, Arlington, VA, USA 22209.
Material Measurement Laboratory, National Institute of Standards and Technology (NIST), 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA.
bioRxiv. 2024 Dec 5:2024.12.02.625685. doi: 10.1101/2024.12.02.625685.
Somatic mosaicism is an important cause of disease, but mosaic and somatic variants are often challenging to detect because they exist in only a fraction of cells. To address the need for benchmarking subclonal variants in normal cell populations, we developed a benchmark containing mosaic variants in the Genome in a Bottle Consortium (GIAB) HG002 reference material DNA from a large batch of a normal lymphoblastoid cell line. First, we used a somatic variant caller with high coverage (300x) Illumina whole genome sequencing data from the Ashkenazi Jewish trio to detect variants in HG002 not detected in at least 5% of cells from the combined parental data. These candidate mosaic variants were subsequently evaluated using >100x BGI, Element, and PacBio HiFi data. High confidence candidate SNVs with variant allele fractions above 5% were included in the HG002 draft mosaic variant benchmark, with 13/85 occurring in medically relevant gene regions. We also delineated a 2.45 Gbp subset of the previously defined germline autosomal benchmark regions for HG002 in which no additional mosaic variants >2% exist, enabling robust assessment of false positives. The variant allele fraction of some mosaic variants is different between batches of cells, so using data from the homogeneous batch of reference material DNA is critical for benchmarking these variants. External validation of this mosaic benchmark showed it can be used to reliably identify both false negatives and false positives for a variety of technologies and detection algorithms, demonstrating its utility for optimization and validation. By adding our characterization of mosaic variants in this widely-used cell line, we support extensive benchmarking efforts using it in simulation, spike-in, and mixture studies.
体细胞嵌合现象是疾病的一个重要成因,但由于嵌合和体细胞变异仅存在于一部分细胞中,因此往往难以检测。为满足对正常细胞群体中亚克隆变异进行基准测试的需求,我们开发了一个基准,其中包含来自一大批量正常淋巴母细胞系的瓶中基因组联盟(GIAB)HG002参考材料DNA中的嵌合变异。首先,我们使用具有高覆盖率(300x)的Illumina全基因组测序数据,对来自德系犹太三人组的体细胞变异调用程序,来检测HG002中未在至少5%的亲本组合数据细胞中检测到的变异。这些候选嵌合变异随后使用>100x的BGI、Element和PacBio HiFi数据进行评估。变异等位基因分数高于5%的高置信度候选单核苷酸变异(SNV)被纳入HG002草稿嵌合变异基准,其中13/85发生在医学相关基因区域。我们还划定了HG002先前定义的种系常染色体基准区域中的一个2.45 Gbp子集,其中不存在额外的>2%的嵌合变异,从而能够对假阳性进行稳健评估。一些嵌合变异的变异等位基因分数在不同批次的细胞之间有所不同,因此使用来自均匀批次的参考材料DNA的数据对于这些变异的基准测试至关重要。对这个嵌合基准的外部验证表明,它可用于可靠地识别各种技术和检测算法的假阴性和假阳性,证明了其在优化和验证方面的实用性。通过添加我们对这种广泛使用的细胞系中嵌合变异的表征,我们支持在模拟、掺入和混合研究中广泛使用它进行基准测试。