검색 상세

Toward an Efficient Deep Image Restoration Method

초록/요약

Image restoration is a classic but challenging problem in the computer vision field. In recent, deep learning-based methods have achieved superior performance by the huge capacity of neural networks and the large volume of the dataset. However, are these heavy models applicable in real-world applications? If not suitable, what are the directions the deep methods would take? To answer these questions, the thesis explore three aspects: model efficiency, data efficiency, and multi-modal distortion. In model efficiency, We define the "efficiency" as both network size and the computation cost to run the network. Many studies have focused on the former alone, but in reality, the latter one is the key ingredient because of the runtime latency and the battery consumption issues. To tackle this, We devise the network structure and rethinking the training strategy to maintain the performance as much as possible while effectively advance both efficiency aspects: network size, and the number of the operations. For data efficiency, We investigate the data augmentation and the unsupervised training in the image restoration task. The data augmentation method is fruitful when the training dataset is small or the network capacity is large without any computation cost in runtime. The unsupervised training assumes the scenario where only low-quality images exist, much challenging compared to the supervised regime. These two concepts have been well analyzed in the high-level vision field, but not many in the image restoration community. With both training strategies, We achieve the huge performance leap to the recent image restoration methods in many real-world scenarios and datasets. Last but not least, We tackle the multi-modal distortion, in particular, when multiple distortions corrupt the different regions of image. The single distortion restoration network or the distortion recognition-restoration pipeline system are not satisfactory in terms of both the performance and the efficiency when serving a model. In contrast, the proposed multi-expert network based on the multi-task learning and the analysis of the multi-modal distribution performs superior restoration accuracy with reasonable computation cost and good efficiency in model serving perspective.

more

목차

1 Introduction 1
1.1 Thesis Outline 3
I Model Perspective Efficiency 6
2 Lightweight Image Restoration Model 7
2.1  Overview 7
2.2  Background 10
2.3  Approach 13
2.3.1 Cascading Residual Network 14
2.3.2 Improving the Perceptual Quality 17
2.3.3 Improving the Efficiency 18
2.3.4 Differences with Prior Works 20
2.4  Experiment 21
2.4.1 Experimental Setting 21
2.4.2 Evaluation Metric 22
2.4.3 Model Design Analysis 23
2.4.4 Initialization Strategy 25
2.4.5 EfficiencyTrade-off 28
2.4.6 Comparison with Pixel-based Methods 29
2.4.7 Comparison with Perception-based Methods 33
2.4.8 Execution Time 36
2.5  Discussion 37
3 Accurate and Lightweight Image Restoration Model 41
3.1  Overview 41
3.2  Background 43
3.3  Approach 44
3.3.1 Progressive Cascading Residual Network 44
3.3.2 Why Progressive Training Works 46
3.4  Experiment 46
3.4.1 Experimental Setting 46
3.4.2 Performance Analysis 47
3.4.3 Comparison with State-of-the-art Methods 49
3.5 Discussion 50
II Data Perspective Efficiency 52
4 Data Augmentation for Low-level Vision 53
4.1  Overview 54
4.2  Background 55
4.3  Approach 57
4.3.1 CutBlur 58
4.3.2 Mixture-of-Augmentation 58
4.3.3 ExperimentalSetting 58
4.4  Comprehensive Analysis of Data Augmentation 61
4.4.1 Problems with Existing Data Augmentations 61
4.4.2 CutBlur 64
4.4.3 Mixture-of-Augmentation 67
4.4.4 Study on Different Models and Datasets 68
4.5  Result 69
4.5.1 Image Super-resolution 69
4.5.2 Single Distortion Restoration 72
4.5.3 Multiple Distortion Restoration 76
4.6  Discussion 79
5 Unsupervised Image Restoration 80
5.1  Overview 80
5.2  Background 83
5.3  Approach 84
5.3.1 Zero-shot Super-resolution 84
5.3.2 SimUSR: Simple Baseline for Unsupervised Super-resolution 85
5.4  Experiment 86
5.4.1 Experimental Setting 87
5.4.2 Bicubic Super-resolution 87
5.4.3 Real-world Super-resolution 88
5.4.4 Execution Time 89
5.5  Discussion 90
III Toward Multi-modal Distortion 94
6 Investigating the Multi-modality in Distortion 95
6.1 Overview 95
6.2  Multi-modal Distortion Scenario 97
6.2.1 Distortion Classification 98
6.2.2 Distortion Detection 98
6.3  Experimental Analysis 99
6.3.1 Experimental Setting 99
6.3.2 Experimental Result 101
6.4  Discussion 105
7 Multi-modal Distortion Restoration 106
7.1  Overview 106
7.2  Background 109
7.3  Spatially Heterogeneous Distortion Scenario 110
7.4  Approach 112
7.4.1  Model Overview 112
7.4.2  Mixture of Parameter Shared Experts 113
7.4.3  Attentive Feature Fusion 115
7.5  Experiment 116
7.5.1 Comparison with State-of-the-art Methods 117
7.5.2 Model Analysis 118
7.6  Discussion 122
8 Conclusion and Discussion 123
8.1  Key Insight 123
8.2  Future Direction 125
8.2.1 Follow-up Topics 125
8.2.2 Long-term Topics 126

more