검색 상세

Real-Time Lightweight Human Parsing Based on Class Relationship Knowledge Distillation

초록/요약

In the field of computer vision, understanding human objectives is a crucial and chal- lenging task, as it requires recognizing and comprehending human presence and behavior in images or videos. Within this domain, human parsing is an extremely challenging task, as it necessitates accurately locating the human region and dividing it into multiple semantic areas. This is a dense prediction task that demands powerful computational capabilities and high-precision models. Recently, with the continuous development of computer vision technologies, human parsing has been widely applied to other tasks related to human ob- jectives, such as pose estimation, and human image generation. These applications are expected to play an increasingly important role in future artificial intelligence research. To achieve real-time human parsing tasks on devices with limited computational re- sources, we have designed and introduced a lightweight human parsing model. We chose Resnet18 as the core network structure and simplified the traditional pyramid module used to obtain high-definition contextual information, thus significantly reducing the complex- ity of the model. Additionally, to enhance the parsing accuracy of the model, we integrated a spatial attention fusion strategy. Our lightweight model exhibits efficient performance and achieves high segmentation accuracy on the commonly used dataset for human parsing tasks, Look into Person (LIP). Although traditional models perform excellently in terms of segmentation accuracy, their high complexity and abundance of parameters restrict their use on devices with limited computational resources. To further improve the accuracy of our lightweight network, we also implemented knowledge distillation techniques. The tra- ditional knowledge distillation method uses the Kullback-Leibler (KL) divergence to match the prediction probability scores of teacher-student models. However, this approach may be ineffective at learning useful knowledge when there is a significant difference between the teacher and student networks. Therefore, we adopted a new distillation standard, based on inter-class and intra-class relationships in prediction results, which significantly improves parsing accuracy. Empirical evidence has shown that, while maintaining high segmentation accuracy, our lightweight model has substantially reduced the number of parameters, thereby achieving our expected goals.

more

목차

I Introduction 1
II Related Works 5
III Proposed Method 8
3.1 Framework Overview 9
3.2 Proposed Method 9
3.2.1 Effective model light-weighting methods 9
3.2.2 An Effective Lightweight Spatial Feature Fusion Attention Method for Human Parsing Models(LSFA) 10
3.2.3 Applying the intra-class and inter-class relationship approach to knowledge distillation 12
IV. Experimental Results and Discussion 16
4.1 Dataset 16
4.2 Implementation Details 16
4.3 Inference speed and performance 17
4.4 Ablation experiment 19
V Conclusion 21
References 22

more