Demirezen Mustafa Umut, Navruz Tuğba Selcen
Data Products Department, UDemy Inc., San Francisco, CA 94107, USA.
Department of Electrical Electronics Engineering, Faculty of Engineering, Gazi University, Ankara 06570, Turkey.
Sensors (Basel). 2023 Aug 31;23(17):7580. doi: 10.3390/s23177580.
This study introduces a novel methodology designed to assess the accuracy of data processing in the Lambda Architecture (LA), an advanced big-data framework qualified for processing streaming (data in motion) and batch (data at rest) data. Distinct from prior studies that have focused on hardware performance and scalability evaluations, our research uniquely targets the intricate aspects of data-processing accuracy within the various layers of LA. The salient contribution of this study lies in its empirical approach. For the first time, we provide empirical evidence that validates previously theoretical assertions about LA, which have remained largely unexamined due to LA's intricate design. Our methodology encompasses the evaluation of prospective technologies across all levels of LA, the examination of layer-specific design limitations, and the implementation of a uniform software development framework across multiple layers. Specifically, our methodology employs a unique set of metrics, including data latency and processing accuracy under various conditions, which serve as critical indicators of LA's accurate data-processing performance. Our findings compellingly illustrate LA's "eventual consistency". Despite potential transient inconsistencies during real-time processing in the Speed Layer (SL), the system ultimately converges to deliver precise and reliable results, as informed by the comprehensive computations of the Batch Layer (BL). This empirical validation not only confirms but also quantifies the claims posited by previous theoretical discourse, with our results indicating a 100% accuracy rate under various severe data-ingestion scenarios. We applied this methodology in a practical case study involving air/ground surveillance, a domain where data accuracy is paramount. This application demonstrates the effectiveness of the methodology using real-world data-intake scenarios, therefore distinguishing this study from hardware-centric evaluations. This study not only contributes to the existing body of knowledge on LA but also addresses a significant literature gap. By offering a novel, empirically supported methodology for testing LA, a methodology with potential applicability to other big-data architectures, this study sets a precedent for future research in this area, advancing beyond previous work that lacked empirical validation.
本研究引入了一种新颖的方法,旨在评估Lambda架构(LA)中数据处理的准确性。LA是一种先进的大数据框架,适用于处理流数据(动态数据)和批数据(静态数据)。与以往专注于硬件性能和可扩展性评估的研究不同,我们的研究独特地针对LA各层内数据处理准确性的复杂方面。本研究的显著贡献在于其实证方法。我们首次提供了实证证据,验证了先前关于LA的理论断言,由于LA的复杂设计,这些断言在很大程度上尚未得到检验。我们的方法包括对LA所有层面的前瞻性技术进行评估,检查特定层的设计局限性,并在多个层实施统一的软件开发框架。具体而言,我们的方法采用了一组独特的指标,包括各种条件下的数据延迟和处理准确性,这些指标是LA准确数据处理性能的关键指标。我们的研究结果有力地说明了LA的“最终一致性”。尽管在速度层(SL)进行实时处理期间可能存在短暂的不一致性,但系统最终会收敛以提供精确可靠的结果,这是由批处理层(BL)的全面计算得出的。这种实证验证不仅证实了先前理论论述提出的主张,还对其进行了量化,我们的结果表明在各种严峻的数据摄取场景下准确率达到100%。我们将这种方法应用于一个涉及空/地监视的实际案例研究中,在该领域数据准确性至关重要。此应用通过实际的数据摄取场景证明了该方法的有效性,因此使本研究有别于以硬件为中心的评估。本研究不仅为现有的LA知识体系做出了贡献,还填补了一个重大的文献空白。通过提供一种新颖的、有实证支持的测试LA的方法,这种方法可能适用于其他大数据架构,本研究为该领域的未来研究开创了先例,超越了以往缺乏实证验证的工作。