Suppr超能文献

InvBFM:基于特征挖掘的高通量测序数据中基因组倒位的发现

InvBFM: finding genomic inversions from high-throughput sequence data based on feature mining.

机构信息

College of Information Science and Technology, Beijing University of Chemical Technology, Beijing, People's Republic of China.

Department of Computer Science and Engineering, University of Connecticut, Storrs, Connecticut, USA.

出版信息

BMC Genomics. 2020 Mar 5;21(Suppl 1):173. doi: 10.1186/s12864-020-6585-1.

Abstract

BACKGROUND

Genomic inversion is one type of structural variations (SVs) and is known to play an important biological role. An established problem in sequence data analysis is calling inversions from high-throughput sequence data. It is more difficult to detect inversions because they are surrounded by duplication or other types of SVs in the inversion areas. Existing inversion detection tools are mainly based on three approaches: paired-end reads, split-mapped reads, and assembly. However, existing tools suffer from unsatisfying precision or sensitivity (eg: only 50~60% sensitivity) and it needs to be improved.

RESULT

In this paper, we present a new inversion calling method called InvBFM. InvBFM calls inversions based on feature mining. InvBFM first gathers the results of existing inversion detection tools as candidates for inversions. It then extracts features from the inversions. Finally, it calls the true inversions by a trained support vector machine (SVM) classifier.

CONCLUSIONS

Our results on real sequence data from the 1000 Genomes Project show that by combining feature mining and a machine learning model, InvBFM outperforms existing tools. InvBFM is written in Python and Shell and is available for download at https://github.com/wzj1234/InvBFM.

摘要

背景

基因组倒位是结构变异 (SV) 的一种类型,已知其在生物学中发挥着重要作用。高通量测序数据分析中的一个既定问题是从测序数据中调用倒位。由于倒位区域周围存在重复或其他类型的 SV,因此检测倒位更加困难。现有的倒位检测工具主要基于三种方法:成对读取、分裂映射读取和组装。然而,现有的工具存在精度或灵敏度不令人满意的问题(例如:灵敏度仅为 50%~60%),需要改进。

结果

在本文中,我们提出了一种新的倒位调用方法,称为 InvBFM。InvBFM 基于特征挖掘来调用倒位。InvBFM 首先收集现有倒位检测工具的结果作为倒位的候选者。然后,它从倒位中提取特征。最后,它通过训练的支持向量机 (SVM) 分类器调用真正的倒位。

结论

我们在 1000 基因组计划的真实序列数据上的结果表明,通过结合特征挖掘和机器学习模型,InvBFM 优于现有工具。InvBFM 是用 Python 和 Shell 编写的,可以在 https://github.com/wzj1234/InvBFM 上下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9afb/7057458/06367dd5c407/12864_2020_6585_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验