A CLIP-SAM-Based Multimodal Semantic Segmentation and Decision Framework for Intelligent Monitoring in Coal Preparation Plants
Abstract
As a key link in clean coal processing, the intelligent upgrade of equipment status monitoring in coal preparation plants is of great significance to ensure production safety and efficiency, while the traditional monitoring system relies on a single sensor data, which has problems such as low fault identification rate and high response delay, making it difficult to cope with multi-source interference under complex working conditions. In order to solve this challenge, this study proposes an intelligent monitoring system based on the CLIP-SAM multi-modal joint analysis architecture, which constructs a cross-modal feature alignment model by combining visible light images, infrared thermal imaging and vibration spectral data, and the experimental results show that in the detection of typical faults such as belt deviation and drum fouling, the comprehensive recognition accuracy of the system is improved to 94.2%, which is 19.8% higher than that of the traditional single-mode method, and the average response time of abnormal events is shortened to 2.3 seconds, which is 98% higher than that of manual inspection. At the same time, with the help of the high-precision image segmentation ability of the SAM model, the positioning error of the coal powder coverage area on the surface of the equipment is reduced to 3.5 pixels, which effectively solves the false detection problem caused by target occlusion in industrial scenarios, and the cross-modal correlation analysis of the CLIP model enables the system to detect light sudden changes environment, which verifies the architecture's environmental adaptability.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i28.10288
This work is licensed under a Creative Commons Attribution 3.0 License.








