Ecole des Mines de Saint-Etienne, Mathematics and Industrial Engineering, Organisation and Environmental Engineering, Henri FAYOL Institute, 42023, Saint-Etienne, France.
Sci Rep. 2023 Mar 27;13(1):4934. doi: 10.1038/s41598-023-30769-8.
Failure analysis has become an important part of guaranteeing good quality in the electronic component manufacturing process. The conclusions of a failure analysis can be used to identify a component's flaws and to better understand the mechanisms and causes of failure, allowing for the implementation of remedial steps to improve the product's quality and reliability. A failure reporting, analysis, and corrective action system is a method for organizations to report, classify, and evaluate failures, as well as plan corrective actions. These text feature datasets must first be preprocessed by Natural Language Processing techniques and converted to numeric by vectorization methods before starting the process of information extraction and building predictive models to predict failure conclusions of a given failure description. However, not all-textual information is useful for building predictive models suitable for failure analysis. Feature selection has been approached by several variable selection methods. Some of them have not been adapted for use in large data sets or are difficult to tune and others are not applicable to textual data. This article aims to develop a predictive model able to predict the failure conclusions using the discriminating features of the failure descriptions. For this, we propose to combine a Genetic Algorithm with supervised learning methods for an optimal prediction of the conclusions of failure in terms of the discriminant features of failure descriptions. Since we have an unbalanced dataset, we propose to apply an F1 score as a fitness function of supervised classification methods such as Decision Tree Classifier and Support Vector Machine. The suggested algorithms are called GA-DT and GA-SVM. Experiments on failure analysis textual datasets demonstrate the effectiveness of the proposed GA-DT method in creating a better predictive model of failure conclusion compared to using the information of the entire textual features or limited features selected by a genetic algorithm based on a SVM. Quantitative performances such as BLEU score and cosine similarity are used to compare the prediction performance of the different approaches.
失效分析已成为电子元件制造过程中保证质量的重要组成部分。失效分析的结论可用于识别元件的缺陷,并更好地了解失效的机制和原因,从而采取补救措施来提高产品的质量和可靠性。失效报告、分析和纠正措施系统是组织报告、分类和评估失效以及计划纠正措施的一种方法。在开始信息提取和构建预测模型以预测给定失效描述的失效结论之前,这些文本特征数据集必须首先通过自然语言处理技术进行预处理,并通过向量化方法转换为数值。然而,并非所有文本信息都对构建适合失效分析的预测模型有用。特征选择已经通过几种变量选择方法来解决。其中一些方法不适用于大数据集或难以调整,而另一些方法则不适用于文本数据。本文旨在开发一种能够使用失效描述的判别特征来预测失效结论的预测模型。为此,我们提出将遗传算法与监督学习方法相结合,以最优地预测失效描述的判别特征的失效结论。由于我们有一个不平衡的数据集,我们建议应用 F1 分数作为监督分类方法(如决策树分类器和支持向量机)的适应度函数。所提出的算法称为 GA-DT 和 GA-SVM。失效分析文本数据集上的实验表明,与使用整个文本特征信息或基于 SVM 的遗传算法选择的有限特征相比,所提出的 GA-DT 方法在创建更好的失效结论预测模型方面更有效。使用 BLEU 分数和余弦相似度等定量性能来比较不同方法的预测性能。