Suppr超能文献

GenHap:一种基于遗传算法的新型单倍型组装计算方法。

GenHap: a novel computational method based on genetic algorithms for haplotype assembly.

机构信息

Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.

Institute of Molecular Bioimaging and Physiology, Italian National Research Council, Contrada Pietrapollastra-Pisciotto, Cefalù (PA), 90015, Italy.

出版信息

BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.

Abstract

BACKGROUND

In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications.

RESULTS

To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets.

CONCLUSIONS

Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap .

摘要

背景

为了全面描述个体的基因组,重建每个染色体的两个不同拷贝,即单倍型,是至关重要的。从读取测序数据推断细胞完整单倍型的计算问题称为单倍型组装,它包括将所有杂合单核苷酸多态性(SNP)准确地分配到两个染色体中的一个染色体上。事实上,完整单倍型的知识通常比分析单个 SNP 更具信息量,并在许多医学应用中发挥着基础性作用。

结果

为了重建两个单倍型,我们解决了加权最小错误校正(wMEC)问题,这是一种用于单倍型组装的成功方法。这个 NP 难问题是计算将测序读取分为两个不相交子集的两个单倍型,SNP 值的校正次数最少。为此,我们在这里提出了 GenHap,这是一种基于遗传算法的新型单倍型组装计算方法,通过全局搜索过程产生最优解。为了评估我们的方法的有效性,我们在基于 Roche/454 和 PacBio RS II 测序技术的两个合成(但现实)数据集上运行了 GenHap。我们将 GenHap 的性能与 HapCol 进行了比较,HapCol 是一种高效的单倍型相位状态的最新算法。我们的结果表明,GenHap 始终获得高精度的解决方案(以单倍型错误率衡量),在 Roche/454 实例中比 HapCol 快 4 倍,在 PacBio RS II 数据集上快 20 倍。最后,我们在两个不同的真实数据集上评估了 GenHap 的性能。

结论

由于其有效解决大型单倍型组装问题实例的能力,新一代测序技术,产生更长、覆盖度更高的读取,将极大地受益于 GenHap。此外,GenHap 中提出的优化方法可以通过利用多目标优化技术扩展到研究等位基因特异性的基因组特征,如表达、甲基化和染色质构象。源代码和完整文档可在以下 GitHub 存储库中获得:https://github.com/andrea-tango/GenHap。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b5bd/6471693/32251f8dad5a/12859_2019_2691_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验