School of Computing Science, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada.
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
Nat Commun. 2018 Feb 26;9(1):828. doi: 10.1038/s41467-018-03273-1.
High-throughput sequencing provides the means to determine the allelic decomposition for any gene of interest-the number of copies and the exact sequence content of each copy of a gene. Although many clinically and functionally important genes are highly polymorphic and have undergone structural alterations, no high-throughput sequencing data analysis tool has yet been designed to effectively solve the full allelic decomposition problem. Here we introduce a combinatorial optimization framework that successfully resolves this challenging problem, including for genes with structural alterations. We provide an associated computational tool Aldy that performs allelic decomposition of highly polymorphic, multi-copy genes through using whole or targeted genome sequencing data. For a large diverse sequencing data set, Aldy identifies multiple rare and novel alleles for several important pharmacogenes, significantly improving upon the accuracy and utility of current genotyping assays. As more data sets become available, we expect Aldy to become an essential component of genotyping toolkits.
高通量测序为确定任何感兴趣基因的等位基因分解提供了手段——一个基因的每个拷贝的拷贝数和确切的序列内容。尽管许多具有临床和功能重要性的基因高度多态性,并经历了结构改变,但尚未设计出有效的高通量测序数据分析工具来有效解决完整的等位基因分解问题。在这里,我们介绍了一种组合优化框架,该框架成功地解决了这个具有挑战性的问题,包括具有结构改变的基因。我们提供了一个相关的计算工具 Aldy,它通过使用全基因组或靶向基因组测序数据来进行高度多态性、多拷贝基因的等位基因分解。对于一个大型多样化的测序数据集,Aldy 为几个重要的药物基因识别出多个罕见和新颖的等位基因,显著提高了当前基因分型检测的准确性和实用性。随着更多数据集的出现,我们预计 Aldy 将成为基因分型工具包的重要组成部分。