Suppr超能文献

用于室间隔缺损关联研究的三人组变异高效挖掘

Efficient Mining of Variants From Trios for Ventricular Septal Defect Association Study.

作者信息

Jiang Peng, Hu Yaofei, Wang Yiqi, Zhang Jin, Zhu Qinghong, Bai Lin, Tong Qiang, Li Tao, Zhao Liang

机构信息

Precision Medicine Research Center, Taihe Hospital, Hubei University of Medicine, Shiyan, China.

School of Computing and Electronic Information, Guangxi University, Nanning, China.

出版信息

Front Genet. 2019 Aug 8;10:670. doi: 10.3389/fgene.2019.00670. eCollection 2019.

Abstract

Ventricular septal defect (VSD) is a fatal congenital heart disease showing severe consequence in affected infants. Early diagnosis plays an important role, particularly through genetic variants. Existing panel-based approaches of variants mining suffer from shortage of large panels, costly sequencing, and missing rare variants. Although a trio-based method alleviates these limitations to some extent, it is agnostic to novel mutations and computational intensive. Considering these limitations, we are studying a novel variants mining algorithm from trio-based sequencing data and apply it on a VSD trio to identify associated mutations. Our approach starts with irrelevant -mer filtering from sequences of a trio a newly conceived coupled Bloom Filter, then corrects sequencing errors by using a statistical approach and extends kept -mers into long sequences. These extended sequences are used as input for variants needed. Later, the obtained variants are comprehensively analyzed against existing databases to mine VSD-related mutations. Experiments show that our trio-based algorithm narrows down candidate coding genes and lncRNAs by about 10- and 5-folds comparing with single sequence-based approaches, respectively. Meanwhile, our algorithm is 10 times faster and 2 magnitudes memory-frugal compared with existing state-of-the-art approach. By applying our approach to a VSD trio, we fish out an unreported gene-CD80, a combination of two genes-MYBPC3 and TRDN and a lncRNA-NONHSAT096266.2, which are highly likely to be VSD-related.

摘要

室间隔缺损(VSD)是一种致命的先天性心脏病,对受影响的婴儿有严重后果。早期诊断起着重要作用,特别是通过基因变异。现有的基于基因panel的变异挖掘方法存在基因panel规模小、测序成本高以及罕见变异缺失等问题。虽然基于三联体的方法在一定程度上缓解了这些限制,但它对新突变不敏感且计算量很大。考虑到这些限制,我们正在研究一种从基于三联体的测序数据中挖掘变异的新算法,并将其应用于一个VSD三联体以识别相关突变。我们的方法首先从三联体序列中进行无关k-mer过滤——一种新构想的耦合布隆过滤器,然后使用统计方法校正测序错误,并将保留的k-mer扩展成长序列。这些扩展序列用作所需变异的输入。之后,将获得的变异与现有数据库进行综合分析以挖掘与VSD相关的突变。实验表明,与基于单序列的方法相比,我们基于三联体的算法分别将候选编码基因和lncRNA的范围缩小了约10倍和5倍。同时,与现有的最先进方法相比,我们的算法速度快10倍,内存使用节省两个数量级。通过将我们的方法应用于一个VSD三联体,我们找出了一个未报道的基因——CD80、两个基因——MYBPC3和TRDN的组合以及一个lncRNA——NONHSAT096266.2,它们极有可能与VSD相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc3f/6694746/81207d608bc2/fgene-10-00670-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验