Jiang Peng, Hu Yaofei, Wang Yiqi, Zhang Jin, Zhu Qinghong, Bai Lin, Tong Qiang, Li Tao, Zhao Liang
Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China.
School of Computing and Electronic Information, Guangxi University, Nanning, China.
Front Genet. 2019 Aug 8;10:670. doi: 10.3389/fgene.2019.00670. eCollection 2019.
Ventricular septal defect (VSD) is a fatal congenital heart disease showing severe consequence in affected infants. Early diagnosis plays an important role, particularly through genetic variants. Existing panel-based approaches of variants mining suffer from shortage of large panels, costly sequencing, and missing rare variants. Although a trio-based method alleviates these limitations to some extent, it is agnostic to novel mutations and computational intensive. Considering these limitations, we are studying a novel variants mining algorithm from trio-based sequencing data and apply it on a VSD trio to identify associated mutations. Our approach starts with irrelevant -mer filtering from sequences of a trio a newly conceived coupled Bloom Filter, then corrects sequencing errors by using a statistical approach and extends kept -mers into long sequences. These extended sequences are used as input for variants needed. Later, the obtained variants are comprehensively analyzed against existing databases to mine VSD-related mutations. Experiments show that our trio-based algorithm narrows down candidate coding genes and lncRNAs by about 10- and 5-folds comparing with single sequence-based approaches, respectively. Meanwhile, our algorithm is 10 times faster and 2 magnitudes memory-frugal compared with existing state-of-the-art approach. By applying our approach to a VSD trio, we fish out an unreported gene-CD80, a combination of two genes-MYBPC3 and TRDN and a lncRNA-NONHSAT096266.2, which are highly likely to be VSD-related.
室间隔缺损(VSD)是一种致命的先天性心脏病,对受影响的婴儿有严重后果。早期诊断起着重要作用,特别是通过基因变异。现有的基于基因panel的变异挖掘方法存在基因panel规模小、测序成本高以及罕见变异缺失等问题。虽然基于三联体的方法在一定程度上缓解了这些限制,但它对新突变不敏感且计算量很大。考虑到这些限制,我们正在研究一种从基于三联体的测序数据中挖掘变异的新算法,并将其应用于一个VSD三联体以识别相关突变。我们的方法首先从三联体序列中进行无关k-mer过滤——一种新构想的耦合布隆过滤器,然后使用统计方法校正测序错误,并将保留的k-mer扩展成长序列。这些扩展序列用作所需变异的输入。之后,将获得的变异与现有数据库进行综合分析以挖掘与VSD相关的突变。实验表明,与基于单序列的方法相比,我们基于三联体的算法分别将候选编码基因和lncRNA的范围缩小了约10倍和5倍。同时,与现有的最先进方法相比,我们的算法速度快10倍,内存使用节省两个数量级。通过将我们的方法应用于一个VSD三联体,我们找出了一个未报道的基因——CD80、两个基因——MYBPC3和TRDN的组合以及一个lncRNA——NONHSAT096266.2,它们极有可能与VSD相关。