검색 상세

Blind Face Restoration Using Swin Transformer with Global Semantic Token Regularization

초록/요약

In this thesis, we propose a framework to solve the blind face restoration that recover a high-quality face image from unknown degradations. Previous methods have shown that the Vector Quantization (VQ) codebook can be powerful prior to solve the blind face restoration. However, it is still challenging to predict code vectors from low-quality im- ages. To solve this problem, we propose a multi-scale transformer consisting of multi-scale cross-attention (MSCA) blocks. The multi-scale transformer com- pensates for lost information of high-level features by globally fusing low-level and high-level features with different spatial resolutions. Also, there is a trade-off problem between pixel-wise fidelity and visual qual- ity of the results. To improve the fidelity of the results, we employ shifted win- dow cross-attention modules at multiple scales. The shifted window method can not calculate inter-window attention to model the abundant facial global con- text. To solve this problem, we propose a shifted window token cross-attention module SW-TCAFM with a global class token to model the global context of face. The global class token models the global context by aggregating informa- tion across all windows and passing it to the next step. In addition, we propose a semantic token regularization loss that makes each global class token represents a specific face component by utilizing the face parsing map prior. Our framework achieves superior performance in both quality and fidelity compared to state-of-the-art methods. In our experiments, we show that the PSNR and FID results of our framework are better than 3.21% and 2.92%, respectively, compared to state-of-the-art method.

more

목차

1 Introduction 1
2 Related Works 5
2.1 Blind Face Restoration 5
3 Proposed Method 7
3.1 Code generation (Stage 1) 8
3.2 Code prediction (Stage 2) 10
3.2.1 Encoder with (S)W-MSA 11
3.2.2 Multi-scale cross-attention transformer 12
3.2.3 Transformer 14
3.3 Feature fusion (Stage 3) 15
3.3.1 SW-TCAFM 16
3.3.2 Semantic token regularization loss 19
4 Experiments and Results 21
4.1 Evaluation Settings and Implementation 21
4.2 Comparative Results with State-of-the-art Methods 22
4.3 Ablation study 23
5 Conclusions 27

more