Fault detection in industrial processes is critical for yield improvement and manufacturing cost reduction. However, most industrial processes produce highly imbalanced and high-dimensional datasets, in which the normal data overwhelm the fault data in number and many noninformative features add noise to the data distribution. Thus, addressing class imbalance and high-dimensionality problems has been considered key to successful fault detection. In this paper, we propose a novel model called an unstructured borderline self-organizing map (UB-SOM) designed to solve these two problems. UB-SOM not only learns the distribution of the normal samples through a small number of representative nodes but also highlights borderline areas. Since UB-SOM yields a new data distribution that emphasizes borderlines, the distributional change from the normal data to the representative nodes reveals which features are considered significant in the borderline areas. We select the significant features based on the featurewise distributional change measured using the Kullback-Leibler divergence. UB-SOM is evaluated based on ten publicly available benchmark imbalanced datasets and two semiconductor process datasets. The experimental results show that we can increase the G-mean by 0.441 for the benchmark datasets and 0.657 for the industrial datasets with data preprocessing throughout UB-SOM. As a result, the proposed method outperforms various undersampling methods incorporating classifier-based feature selection methods.
Bibliographical notePublisher Copyright:
© 2021 Elsevier Ltd
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Artificial Intelligence