Suppr超能文献

迈向基于比对的准确、连续且完整的多倍体定相算法。

Towards accurate, contiguous and complete alignment-based polyploid phasing algorithms.

作者信息

Saada Omar Abou, Friedrich Anne, Schacherer Joseph

机构信息

Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France.

Université de Strasbourg, CNRS, GMGM UMR, 7156 Strasbourg, France; Institut Universitaire de France (IUF), Paris, France.

出版信息

Genomics. 2022 May;114(3):110369. doi: 10.1016/j.ygeno.2022.110369. Epub 2022 Apr 26.

Abstract

Phasing, and in particular polyploid phasing, have been challenging problems held back by the limited read length of high-throughput short read sequencing methods which can't overcome the distance between heterozygous sites and labor high cost of alternative methods such as the physical separation of chromosomes for example. Recently developed single molecule long-read sequencing methods provide much longer reads which overcome this previous limitation. Here we review the alignment-based methods of polyploid phasing that rely on four main strategies: population inference methods, which leverage the genetic information of several individuals to phase a sample; objective function minimization methods, which minimize a function such as the Minimum Error Correction (MEC); graph partitioning methods, which represent the read data as a graph and split it into k haplotype subgraphs; cluster building methods, which iteratively grow clusters of similar reads into a final set of clusters that represent the haplotypes. We discuss the advantages and limitations of these methods and the metrics used to assess their performance, proposing that accuracy and contiguity are the most meaningful metrics. Finally, we propose the field of alignment-based polyploid phasing would greatly benefit from the use of a well-designed benchmarking dataset with appropriate evaluation metrics. We consider that there are still significant improvements which can be achieved to obtain more accurate and contiguous polyploid phasing results which reflect the complexity of polyploid genome architectures.

摘要

定相,尤其是多倍体定相,一直是具有挑战性的问题,受到高通量短读长测序方法读长有限的阻碍,这种方法无法克服杂合位点之间的距离,而且诸如染色体物理分离等替代方法成本高昂。最近开发的单分子长读长测序方法提供了长得多的读长,克服了此前的这一限制。在此,我们综述基于比对的多倍体定相方法,这些方法依赖四种主要策略:群体推断方法,利用多个个体的遗传信息对样本进行定相;目标函数最小化方法,将诸如最小错误校正(MEC)等函数最小化;图划分方法,将读长数据表示为一个图并将其拆分为k个单倍型子图;聚类构建方法,将相似读长的聚类迭代扩展为代表单倍型的最终聚类集。我们讨论了这些方法的优缺点以及用于评估其性能的指标,提出准确性和连续性是最有意义的指标。最后,我们提出基于比对的多倍体定相领域将极大地受益于使用设计良好的基准数据集和适当的评估指标。我们认为,要获得更准确和连续的多倍体定相结果以反映多倍体基因组结构的复杂性,仍有显著的改进空间。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验