Wagner Justin, Olson Nathan D, McDaniel Jennifer, Harris Lindsay, Pinto Brendan J, Jáspez David, Muñoz-Barrera Adrián, Rubio-Rodríguez Luis A, Lorenzo-Salazar José M, Flores Carlos, Sahraeian Sayed Mohammad Ebrahim, Narzisi Giuseppe, Byrska-Bishop Marta, Evani Uday S, Xiao Chunlin, Lake Juniper A, Fontana Peter, Greenberg Craig, Freed Donald, Mootor Mohammed Faizal Eeman, Boutros Paul C, Murray Lisa, Shafin Kishwar, Carroll Andrew, Sedlazeck Fritz J, Wilson Melissa, Zook Justin M
Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr., Gaithersburg, MD, USA.
Center for Evolution & Medicine and School of Life Sciences, Arizona State University, Tempe, AZ 85281 USA - Department of Zoology, Milwaukee Public Museum, Milwaukee, WI, USA.
Nat Commun. 2025 Jan 8;16(1):497. doi: 10.1038/s41467-024-55710-z.
The sex chromosomes contain complex, important genes impacting medical phenotypes, but differ from the autosomes in their ploidy and large repetitive regions. To enable technology developers along with research and clinical laboratories to evaluate variant detection on male sex chromosomes X and Y, we create a small variant benchmark set with 111,725 variants for the Genome in a Bottle HG002 reference material. We develop an active evaluation approach to demonstrate the benchmark set reliably identifies errors in challenging genomic regions and across short and long read callsets. We show how complete assemblies can expand benchmarks to difficult regions, but highlight remaining challenges benchmarking variants in long homopolymers and tandem repeats, complex gene conversions, copy number variable gene arrays, and human satellites.
性染色体包含影响医学表型的复杂且重要的基因,但在倍性和大的重复区域方面与常染色体不同。为了使技术开发者以及研究和临床实验室能够评估男性性染色体X和Y上的变异检测,我们为“瓶中基因组”HG002参考材料创建了一个包含111,725个变异的小变异基准集。我们开发了一种主动评估方法,以证明该基准集能够可靠地识别具有挑战性的基因组区域以及短读长和长读长调用集中的错误。我们展示了完整的组装如何将基准扩展到困难区域,但也强调了在长同聚物和串联重复序列、复杂基因转换、拷贝数可变基因阵列以及人类卫星序列中对变异进行基准测试时仍然存在的挑战。