CAS阵列：用于中国生物样本库的基因分型阵列的设计与评估

CAS Array: design and assessment of a genotyping array for Chinese biobanking.

作者信息

Tian Zijian, Chen Fei, Wang Jing, Wu Benrui, Shao Jian, Liu Ziqing, Zheng Li, Wang You, Xu Tao, Zhou Kaixin

机构信息

National Laboratory of Biomacromolecules, Institute of Biophysics Chinese Academy of Sciences, Beijing 100101, China.

College of Life Sciences, University of the Chinese Academy of Sciences, Beijing 10140, China.

出版信息

Precis Clin Med. 2023 Feb 23;6(1):pbad002. doi: 10.1093/pcmedi/pbad002. eCollection 2023 Mar.

DOI:10.1093/pcmedi/pbad002

PMID:36968613

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10031742/

Abstract

BACKGROUND

Chronic diseases are becoming a critical challenge to the aging Chinese population. Biobanks with extensive genomic and environmental data offer opportunities to elucidate the complex gene-environment interactions underlying their aetiology. Genome-wide genotyping array remains an efficient approach for large-scale genomic data collection. However, most commercial arrays have reduced performance for biobanking in the Chinese population.

MATERIALS AND METHODS

Deep whole-genome sequencing data from 2 641 Chinese individuals were used as a reference to develop the CAS array, a custom-designed genotyping array for precision medicine. Evaluation of the array was performed by comparing data from 384 individuals assayed both by the array and whole-genome sequencing. Validation of its mitochondrial copy number estimating capacity was conducted by examining its association with established covariates among 10 162 Chinese elderly.

RESULTS

The CAS Array adopts the proven Axiom technology and is restricted to 652 429 single-nucleotide polymorphism (SNP) markers. Its call rate of 99.79% and concordance rate of 99.89% are both higher than for commercial arrays. Its imputation-based genome coverage reached 98.3% for common SNPs and 63.0% for low-frequency SNPs, both comparable to commercial arrays with larger SNP capacity. After validating its mitochondrial copy number estimates, we developed a publicly available software tool to facilitate the array utility.

CONCLUSION

Based on recent advances in genomic science, we designed and implemented a high-throughput and low-cost genotyping array. It is more cost-effective than commercial arrays for large-scale Chinese biobanking.

摘要

背景

慢性病正成为中国老龄化人口面临的一项严峻挑战。拥有广泛基因组和环境数据的生物样本库为阐明其病因背后复杂的基因-环境相互作用提供了机会。全基因组基因分型阵列仍然是大规模基因组数据收集的有效方法。然而，大多数商业阵列在中国人群生物样本库中的性能有所下降。

材料与方法

来自2641名中国个体的深度全基因组测序数据被用作参考，以开发CAS阵列，这是一种为精准医学定制设计的基因分型阵列。通过比较384名同时接受该阵列检测和全基因组测序的个体的数据来评估该阵列。通过检查其与10162名中国老年人中已确定的协变量之间的关联，对其线粒体拷贝数估计能力进行验证。

结果

CAS阵列采用了经过验证的Axiom技术，限于652429个单核苷酸多态性（SNP）标记。其检出率为99.79%，一致性率为99.89%，均高于商业阵列。其基于填充的基因组覆盖率对于常见SNP达到98.3%，对于低频SNP达到63.0%，均与具有更大SNP容量的商业阵列相当。在验证其线粒体拷贝数估计后，我们开发了一个公开可用的软件工具，以促进该阵列的应用。