Self-Supervised Anomaly Detection for Vehicles via Sensor Fusion
- 주제(키워드) Sensor Fusion , Anomaly Detection
- 주제(DDC) 006.31
- 발행기관 아주대학교 일반대학원
- 지도교수 Hyung Il Koo
- 발행년도 2026
- 학위수여년월 2026. 2
- 학위명 석사
- 학과 및 전공 일반대학원 인공지능학과
- 실제URI http://www.dcollection.net/handler/ajou/000000035575
- 본문언어 영어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
초록/요약
Anomalous sound detection has become increasingly important for auto- motive safety, especially in shared mobility and autonomous driving scenarios where human perception of irregular events is inherently limited. However, ex- isting sound-based methods are highly susceptible to environmental noise, while sensor-only approaches require a large number of sensors and often lack con- textual understanding. In this paper, we propose a lightweight, real-time, self- supervised anomaly detection model by fusing acoustic signals and vehicle sen- sor signal (Controller Area Network, CAN) data. Audio signals are represented using a log-mel spectrogram (Sgram) and a temporal gram (Tgram), which are concatenated and processed by a MobileFaceNet backbone to capture both stationary (spectral) patterns and temporal dynamics. Moreover, CAN signals are modeled with a self-attention-based network to capture inter-sensor corre- lations. The features extracted from both modalities are then fused and passed to a fully connected classification layer to detect anomalies. We evaluate the proposed approach on real-world datasets collected from Hyundai vehicles (SX2-HEV, SX2-ICE). Experimental results show that the fu- sion of audio and CAN sensor data provides significant gains in both accuracy and robustness across different vehicle platforms. While single-modality models exhibit considerable performance variation depending on the vehicle type, the multimodal approach consistently delivers more stable and reliable results. In particular, our proposed STgram+CAN model achieves the best overall perfor- mance, recording 94.1% AUC on the SX2-HEV vehicle and yielding the highest average performance across all platforms with 86.4% AUC and 74.8% pAUC. In addition, the model remains lightweight, requiring less than 1 GB of memory during inference and processing a 10-second audio segment in approximately 75 ms on an NVIDIA RTX 4090 GPU. Successful deployment on the NVIDIA Jet- son AGX Orin and Orin Nano further demonstrates its suitability for real-time, resource-constrained in-vehicle environments.
more목차
1 Introduction 1
1.1 CAN data 3
2 Related Works 7
2.1 Anomalous Sound Detection (ASD) 7
2.2 Sensor-based Anomaly Detection in Vehicles 8
2.3 Multimodal Anomaly Detection 9
2.4 Self-Supervised Classification 9
3 Proposed Method 11
3.1 Audio feature extraction 11
3.2 Self-attention based sensor feature extraction 13
3.3 Feature fusion and self-supervised anomaly detection 15
4 Experiments and Results 18
4.1 Dataset 18
4.2 Pre-processing 19
4.3 Implementation Details 22
4.4 Evaluation Metrics 22
4.5 Sensor-Driven Fault Discrimination Analysis 23
4.6 Performance Comparison 25
4.6.1 Single-Modality Feature 25
4.6.2 Multimodal Feature Fusion 26
4.7 Computational Complexity 27
4.8 Visualization Analysis 28
5 Conclusions 31
초록 38

