IEEE/ACM Trans Comput Biol Bioinform. 2024 Jan-Feb;21(1):26-35. doi: 10.1109/TCBB.2023.3328714. Epub 2024 Feb 5.
This article proposes an event-driven solution to genotype imputation, a technique used to statistically infer missing genetic markers in DNA. The work implements the widely accepted Li and Stephens model, primary contributor to the computational complexity of modern x86 solutions, in an attempt to determine whether further investigation of the application is warranted in the event-driven domain. The model is implemented using graph-based Hidden Markov Modeling and executed as a customized forward/backward dynamic programming algorithm. The solution uses an event-driven paradigm to map the algorithm to thousands of concurrent cores, where events are small messages that carry both control and data within the algorithm. The design of a single processing element is discussed. This is then extended across multiple cores and executed on a custom RISC-V NoC cluster called POETS. Results demonstrate how the algorithm scales over increasing hardware resources and a multi-core run demonstrates a 270X reduction in wall-clock processing time when compared to a single-threaded x86 solution. Optimisation of the algorithm via linear interpolation is then introduced and tested, with results demonstrating a wall-clock reduction time of ∼ 5 orders of magnitude when compared to a similarly optimised x86 solution.
本文提出了一种基于事件驱动的基因型推断解决方案,该技术用于统计推断 DNA 中缺失的遗传标记。该工作实现了被广泛接受的 Li 和 Stephens 模型,这是现代 x86 解决方案计算复杂度的主要贡献者,试图确定在事件驱动领域是否有必要进一步研究该应用。该模型使用基于图的隐马尔可夫模型实现,并作为定制的前向/后向动态规划算法执行。该解决方案使用事件驱动范例将算法映射到数千个并发核上,其中事件是在算法中携带控制和数据的小消息。讨论了单个处理元素的设计。然后将其扩展到多个核心,并在称为 POETS 的自定义 RISC-V NoC 集群上执行。结果表明,该算法在增加硬件资源时如何扩展,并且与单线程 x86 解决方案相比,多核运行将处理时间减少了 270 倍。然后引入并测试了通过线性插值对算法进行优化的结果,与类似优化的 x86 解决方案相比,结果表明处理时间减少了 ∼ 5 个数量级。