검색 상세

Action Segmentation using Bezier Curvature as Spatio-Temporal Feature by Triplet Learning

초록/요약

With the development of recording technologies, the demand for video-based techniques is increasing. Despite the success in action segmentation which classifies short trimmed video, it remains a challenge to use long untrimmed videos. Action segmentation is the field of detecting and temporally locating segments in a video. Although previous approaches have shown an outstanding architectural development, the feature extractor remains. Recent approaches require additional temporal information such as action boundary information, which is difficult to obtain in real-world assumptions. This is because temporal features are not as well developed as spatial features. In this thesis, we propose a new feature synthesis framework, called a Temporal Curvature Feature (TCF). This framework consists of two stages: (a) framewise embedding and (b) curvature synthesis. In framewise embedding stage, we use a triplet network to map a video into T points. which are based on each action label corresponding to the frame. In curvature synthesis stage, we approximate a curve with these embedding points and synthesize the curvatures from the curve. These curvatures are used to enhance the temporal information of data through a framewise residual operation. The outputs have the same shape as the old shape and are used as the new input to bring out the potential from various models. To validate the effectiveness of our approach, curvatures are plugged into three action segmentation datasets, i.e., GTEA, 50Salads, and Breakfast, and we use the new input to train the previous state-of-the-art models: MS-TCN, MS-TCN2, ASRF, and ASFormer. The result tables show the overall increases in the performances. In particular, the F1 scores show the effectiveness of the approach in solving segmentation problem. Finally, the figures demonstrate that the curvature helps the model to better understand the temporal information.

more

목차

1 Introduction 1
2 Related Works 4
3 Method 6
3.1 Framewise Embedding 7
3.1.1 Triplet Network for Video 7
3.1.2 Reorganization for Triplet Selection 8
3.2 Curvature Synthesis 9
3.2.1 Bezier Curve Principle 9
3.2.2 Continuous Temporal Information 9
3.2.3 Discrete Temporal Information 10
3.3 Action Segmentation from Curvature 10
4 Experiment 12
4.1 Datasets 12
4.2 Metrics 12
4.3 Backbone Models 13
4.4 Quantitative Results 13
4.4.1 Comparison with the state-of-the-art on GTEA dataset 14
4.4.2 Comparison with the state-of-the-art on 50salads dataset 15
4.4.3 Comparison with the state-of-the-art on Breakfast dataset 16
4.5 Qualitative Results 17
4.5.1 Curvature Effect on Backbone 1 17
4.5.2 Curvature Effect on Backbone 2 18
4.5.3 Curvature Effect on Backbone 3 19
4.6 Effect of Reorganization 20
4.6.1 Partition Selection 20
4.6.2 Successive Selection 21
4.6.3 Reorganization 22
5 Conclusion 23

more