Forensic Science Programme, FSK, Universiti Kebangsaan Malaysia, Jalan Raja Muda Abdul Aziz, 50300 Kuala Lumpur, Malaysia.
Analyst. 2018 Jul 23;143(15):3526-3539. doi: 10.1039/c8an00599k.
Partial least squares-discriminant analysis (PLS-DA) is a versatile algorithm that can be used for predictive and descriptive modelling as well as for discriminative variable selection. However, versatility is both a blessing and a curse and the user needs to optimize a wealth of parameters before reaching reliable and valid outcomes. Over the past two decades, PLS-DA has demonstrated great success in modelling high-dimensional datasets for diverse purposes, e.g. product authentication in food analysis, diseases classification in medical diagnosis, and evidence analysis in forensic science. Despite that, in practice, many users have yet to grasp the essence of constructing a valid and reliable PLS-DA model. As the technology progresses, across every discipline, datasets are evolving into a more complex form, i.e. multi-class, imbalanced and colossal. Indeed, the community is welcoming a new era called big data. In this context, the aim of the article is two-fold: (a) to review, outline and describe the contemporary PLS-DA modelling practice strategies, and (b) to critically discuss the respective knowledge gaps that have emerged in response to the present big data era. This work could complement other available reviews or tutorials on PLS-DA, to provide a timely and user-friendly guide to researchers, especially those working in applied research.
偏最小二乘判别分析(PLS-DA)是一种功能强大的算法,可用于预测和描述性建模以及判别变量选择。然而,多功能性既是福也是祸,用户需要优化大量参数才能获得可靠和有效的结果。在过去的二十年中,PLS-DA 在为各种目的建模高维数据集方面取得了巨大成功,例如食品分析中的产品认证、医学诊断中的疾病分类和法医学中的证据分析。尽管如此,在实践中,许多用户尚未掌握构建有效和可靠的 PLS-DA 模型的本质。随着技术的进步,各个领域的数据集都在向更复杂的形式发展,即多类、不平衡和庞大。事实上,该领域正在迎来一个新的时代,称为大数据。在这种情况下,本文的目的有两个:(a)回顾、概述和描述当代 PLS-DA 建模实践策略,以及(b)批判性地讨论针对当前大数据时代出现的各自知识差距。这项工作可以补充其他关于 PLS-DA 的可用评论或教程,为研究人员,特别是从事应用研究的人员提供及时且用户友好的指南。