검색 상세

Improving Visual Representation via Deep Learning-based Knowledge Transfer Methods

초록/요약

Deep learning models have revolutionized the field of computer vision tasks by enabling feature extraction through visual representation learning. However, improving the vi- sual representation of these models remains a significant challenge, which has been the subject of extensive research. Visual representation learning is critical, as well-learned features can improve task performance and provide robust generality to input data. We focuse on the effectiveness of knowledge transfer methods on improving the visual representation of deep learning-based models. Knowledge transfer methods in- volve using a pretrained model to fine-tune or knowledge distillation to learn the visual representation of deep learning models for various forms of input data. Self-supervised learning is a machine learning approach in which a model is trained to extract features from data without the requirement of manual annotation or labeling. This approach involves training the model to predict specific properties or relationships within the input data. By doing so, the model learns meaningful representations of the data that can be used for downstream tasks. Concisely, self-supervised learning allows the deep learning based model to learn the visual representation efficiently. Secondly, we concentrate knowledge distillation based approach. Knowledge distillation is a process in which a student model is trained to imitate the performance of a teacher model. The student model learns from the teacher model’s output and is trained to mimic its behavior. Knowledge distillation can be used to improve the visual representation of deep learning models by taking a pre-trained model and training a smaller model to imitate its behavior. The resulting student model can learn more efficiently and with fewer trainable parameters. In this dissertation, we propose novel knowledge transfer approaches to improving the visual representation. First, we propose a self-supervised learning method to learn the spatio-temporal representation from RGB-based video by variable playback speed prediction and reconstituted batch normalization for proposed pretext task. Next, we propose a knowledge distillation method which utilizes an autoencoder network to learn the 3D representation of point clouds and self-attention network to transfer the result-oriented representation from teacher network. To validate the effectiveness of the proposed knowledge transfer methods, the deep learning models trained using knowledge transfer are evaluated on benchmark datasets commonly used in each task. The results demonstrate that knowledge transfer methods can significantly improve the visual representation of deep learning models. The pro- posed methods demonstrate proficiency in improving the performance of the models in various computer vision tasks.

more

목차

1. Introduction 1
2. Self-Supervised Learning for Spatio-temporal Representation 5
3. Knowledge Distillation for 3D Visual Representation 32
4. Conclusion 62

more