검색 상세

Network Group-based Knowledge Distillation using Online Role Change

초록/요약

In knowledge distillation, since a single, omnipotent teacher network cannot solve all problems, multiple teacher-based knowledge distillations have been studied recently. However, sometimes their improvements are not as good as expected because some immature teachers may transfer the false knowledge to the student. In this paper, to overcome this limitation and take the efficacy of the multiple networks, we divide the multiple networks into teacher and student groups, respectively. That is, the student group is a set of immature networks that require learning the teacher's knowledge, while the teacher group consists of the selected networks that have performed well. Furthermore, according to our online role change strategy, the top-ranked networks in the student group are able to promote to the teacher group at every iteration and vice versa. After training the teacher group using the error images of the student group to refine the teacher group's knowledge, we transfer the collective knowledge from the teacher group to the student group successfully. We verify the superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet which achieves high performance. We further show the generality of our method with various backbone architectures such as resnet, wrn, vgg, mobilenet, and shufflenet.

more

목차

Ⅰ. Introduction 1
Ⅱ. Related Works 5
Ⅲ. Proposed Method 8
1. Background 8
2. Group-based Knowledge Distillation using Online Role change 9
3. Intensive teaching 12
4. Private teaching 13
5. Group teaching 14
Ⅳ. Experimental Results and Discussion 14
1. Datasets 15
2. Networks 15
3. Implementation Details 15
4. Ablation Study: The number of temporary teachers 16
5. Ablation Study: Type of augmentation for intensive teaching 17
6. Ablation Study: Style of group teaching 18
7. Ablation Study: Network perspective of equal size 19
8. Ablation Study: Performance contribution by individual components 20
9. Ablation Study: t-SNE visualization 21
10. Ablation Study: Frequency of temporary teacher 21
11. Comparison with The State-of-the-art Methods 22
Ⅴ. Conclusion 26
Reference 27

more