검색 상세

Deep Learning Methods for Sign Language Production

목차

1 Introduction 1
1.1 Introduction 1
1.2 Contributions of This Dissertation 6
1.3 Outlines 7
2 Background 9
2.1 Avatar Approaches for Sign Language Production 9
2.2 Deep Learning Approaches for Sign Language Production 10
2.2.1 Pro-Transformer 12
2.3 Summary 16
3 Datasets and Evaluation Metrics 18
3.1 Datasets 18
3.2 Evaluation Protocols 22
3.3 Evaluation Metrics 23
3.3.1 Back-Translation Model 24
3.3.2 Bilingual Evaluation Understudy (BLEU) 26
3.3.3 Recall-Oriented Understudy for Gisting Evaluation (ROUGE) 27
3.3.4 Word Error Rate (WER) 28
4 Cascade Dual-decoder Transformer for Sign Language Production 29
4.1 Cascade Dual-decoder Transformer 31
4.1.1 Text Encoder 32
4.1.2 Hand Pose Decoder 32
4.1.3 Sign Pose Decoder 33
4.2 Spatio-Temporal Loss 35
4.2.1 Spatial Regression Loss 35
4.2.2 Temporal Continuity Loss 36
4.3 Performance Evaluations 38
4.3.1 Model Configuration 38
4.3.2 Quantitative Results 39
4.3.2.1 Baseline Comparison 39
4.3.3 Ablation Study 41
4.3.3.1 Impact of Different Numbers of Decoder Layers 42
4.3.3.2 Effect of Spatio-Temporal Loss 43
4.3.4 Qualitative Analysis 44
4.4 Conclusions 48
5 Multi-Channel Spatio-Temporal Transformer for Sign Language Production 49
5.1 Problem Definition 49
5.2 Mutil-Channel Spatio-Temporal Transformer 51
5.2.1 Encoder 51
5.2.2 Multi-Channel Spatio-Temporal Decoder 52
5.2.2.1 Channel-Specific and Full-Channel Embedding 52
5.2.2.2 Spatial-Attention Module 53
5.2.2.3 Temporal-Attention Module 54
5.2.2.4 Spatio-Temporal Fusion Module 56
5.3 Performance Evaluations 57
5.3.1 Model Configuration 58
5.3.2 Quantitative Results 58
5.3.2.1 Baseline Comparison 58
5.3.2.2 Ablation Study 61
5.3.3 Qualitative Analysis 62
5.4 Conclusions 63
6 Conclusions and Future Work 66
6.1 Conclusions 66
6.2 Possible Future Work 67
Bibliography 69
A List of Research Outputs 75
A.1 SCI/SCIE Journal Papers 75
A.2 International Conference Papers 75

more