Sensor Fusion based AutoEncoder Feature Distillation for 3D Object Detection
센서퓨전 기반 3D 객체 검출을 위한 특징맵 지식 증류 연구
- 주제(키워드) Sensor Fusion , Knowledge Distillation , Capacity gap , Representation ability
- 주제(DDC) 006.31
- 발행기관 아주대학교 일반대학원
- 지도교수 Wonjun Hwang
- 발행년도 2024
- 학위수여년월 2024. 8
- 학위명 석사
- 학과 및 전공 일반대학원 인공지능학과
- 실제URI http://www.dcollection.net/handler/ajou/000000033856
- 본문언어 한국어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
초록/요약
In our research, we introduce a novel approach to knowledge distillation aimed at enhancing the computational efficiency of 3D object detection within a teacher-student framework. The essence of our method lies in enabling the student model to distill knowledge from the teacher model, thereby reducing computational complexity while minimizing the performance gap between the two models throughout the training process. Traditionally, knowledge distillation techniques have primarily focused on improving the performance of classifiers and have often proven inapplicable or less effective for 3D object detection tasks. To address this problem, we proposed a method using an autoencoder to effectively distill the teacher’s fused information into the student’s BEV through knowledge distillation. This enables the student model to learn important but difficult-to-capture feature representations from the teacher model, thus allowing it to learn effectively and efficiently. Moreover, we introduce a training strategy that not only reduces the parameters of the student network but also enhances its performance compared to existing models. This dual objective of parameter reduction and performance improvement is achieved through careful design choices and optimization techniques, ensuring that the student model achieves competitive results with fewer computational resources. To validate the efficacy of our proposed methodology, we conduct comprehensive experiments using the nuScenes dataset, a widely used benchmark in the field of 3D object detection. Our experiments are based on the ResNet[16] model architecture, which serves as the backbone for both the teacher and student networks. Through rigorous experimentation and evaluation, we demonstrate the effectiveness and practical applicability of our approach in the context of real-world object detection tasks.
more목차
Ⅰ. Introduction 1
Ⅱ. Related Work 5
Ⅲ. Network Overview 8
Ⅰ. Framework Overview 8
Ⅱ. Proposed Method 9
Ⅳ. Experimental Results and Discussion 14
Ⅰ. Implementation Details 14
Ⅱ. Datasets 15
Ⅲ. Evaluation metrics 16
Ⅳ. Comparative Approaches 17
Ⅴ. Fusion, BEV knowledge distillation comparison 18
Ⅵ. Ablation Study: Comparison results of L1, L2, KL_Divergence 19
Ⅴ. Conclusion 21
Ⅵ. References 24

