검색 상세

Trunk-to-Branch: Lightweight Multi-Sub Distillation with High-Order Feature Semantics

초록/요약

This paper proposes a novel online knowledge distillation framework that enables various vision models to generate and transfer knowledge from multiple perspectives using only lightweight additional networks. Previous online distillation studies have focused on self-distillation without pre-trained teacher models, aiming to generate and transfer new knowledge on their own. However, they suffer from several limitations, including high computational overhead, semantic gaps between stages, and redundant representations. To address these issues, we propose a method that creates multiple peer branches using only a single lightweight layer instead of stacking deep layers, and introduces a learning algorithm that reduces correlation among peer branches to enhance representational diversity. As a result, the backbone network can effectively learn diverse information from the peer branches while using minimal additional resources, and the peer branches are removed during inference, preserving the original model’s inference speed. Moreover, unlike previous studies that mainly demonstrated improvements on basic models such as ResNet, we validate the effectiveness of our approach on modern architectures such as ConvNeXt and CSWin. keyword: Knowledge distillation, Lightweight Peer Branches, Semantic Alignments.

more

목차

I. Introduction 1
II. Related Work 4
A. Conventional Knowledge Distillation 4
B. Online Distillation 5
1) Self Knowledge Distillation 5
2) Multi-View Knowledge Distillation 6
III. Method 7
A. Semantic Permeation Structure 8
1) Feature Concatenation 8
2) Semantic Alignment via Backpropagation 9
B. Diverse Lightweight Peer Branches 10
1) Peer Branch Architecture 10
2) Decorrelation loss 11
3) Peer Branches Distillation 12
IV. Experiment 14
A. Experimental Setup 14
B. Image Classification 15
C. Comparison with conventional Distillation 19
V. Ablation 21
A. Influence from the SPS 21
B. Branch-wise and Ensemble Distillation 22
C. Hyperparameter Analysis 23
D. Branching Point Location 24
VI. Conclusion 27
References 28

more