Vision-Based Deep Learning Framework for Detecting Driver Distraction and Safety-Critical Behavior in Intelligent Transportation Systems
- 주제(키워드) Driver monitoring systems , distracted driving , grayscale deep learning , temporal attention , real-time embedded inference.
- 주제(DDC) 004.6
- 발행기관 아주대학교 일반대학원
- 지도교수 Byeong-hee Roh
- 발행년도 2026
- 학위수여년월 2026. 2
- 학위명 박사
- 학과 및 전공 일반대학원 AI융합네트워크학과
- 실제URI http://www.dcollection.net/handler/ajou/000000035946
- 본문언어 영어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
초록/요약
Driver distraction and unsafe behaviors remain major contributors to road accidents, emphasizing the need for driver monitoring systems that are accurate, robust, and suitable for embedded in-vehicle deployment. This dissertation presents a coherent grayscale-based, vision-only methodology that progresses from single-frame distraction recognition to efficient temporal modeling of driver behaviors. The first contribution is ResNet-RG, a lightweight residual convolutional network for frame-level distraction detection using grayscale cabin images, integrating residual learning with batch normalization, dropout, and spatial attention to emphasize behavior-relevant regions. Building on this, DriveAlertNet introduces a hybrid frame–video approach in which a grayscale-optimized ResBoot-50 backbone is trained with epoch-wise bootstrap sampling, and temporal information is incorporated through majority voting, enabling video-level inference without recurrent architectures. Finally, the third contribution proposes ATFE, an efficient temporal attention framework that partitions continuous grayscale streams into overlapping frame sequences, extracts per-frame embeddings, and weights informative frames more heavily to improve recognition of short and visually similar unsafe actions. Together, these contributions form a unified methodology connecting grayscale-efficient backbones with progressively richer temporal modeling, enabling accurate, stable, and real-time analysis of driver behavior from monocular visual input.
more목차
I. Introduction 1
1.1 Motivation and Objectives 1
1.2 Research Contributions 3
1.2.1 Enhancing Driver Safety with ResNet-RG - A Deep Learning Based Distraction Detection Approach 4
1.2.2 DriveAlertNet: A Hybrid Frame–Video Approach for Distracted Driving Detection 4
1.2.3 An Efficient Temporal Attention Framework for Real-Time Detection of Unsafe Driver Behaviors 5
1.3 Organization of the Dissertation 6
II. Background 7
2.1 Driver Distraction and In-Cabin Behaviors 7
2.2 Vision-Based Driver Monitoring Systems 8
2.3 Frame-Level Deep Learning for Driver Behavior Recognition 9
2.3.1 Convolutional and Residual Architectures 9
2.3.2 Grayscale Modeling for Embedded DMS 10
2.3.3 Class Imbalance and Data Augmentation 10
2.4 Temporal Modeling of Driver Behaviors 11
2.4.1 Simple Temporal Aggregation from Frame-Level Predictions 11
2.4.2 Recurrent, 3D Convolutional, and Attention-Based Models 12
2.5 In-Cabin Datasets and Evaluation Considerations 13
III. Enhancing Driver Safety with ResNet-RG - A Deep Learning Based Distraction Detection Approach 14
3.1 Introduction 14
3.2 Motivation and Problem Formulation 16
3.3 Proposed Methodology 18
3.3.1 ResNet-RG Architecture 19
3.3.2 Residual Learning and Grayscale Adaptation 21
3.3.3 Attention Mechanism 21
3.3.4 Loss Function and Optimization 22
3.3.5 Data Preparation and Frame Extraction 23
3.4 Datasets and Pre-Processing 23
3.4.1 Benchmark Datasets 23
3.4.2 Frame Extraction and Pre-Processing 26
3.5 Training Objective and Optimization 26
3.6 Experimental Setup and Evaluation Metrics 29
3.6.1 Experimental Setup 29
3.6.2 Evaluation Metrics 30
3.7 Experimental Results and Discussion 31
3.7.1 Comparison with ResNet Backbones and Baseline Models 31
3.8 Summary 34
IV. DriveAlertNet: A Hybrid Frame-Video Approach for Real-Time Distracted Driving Detection 36
4.1 Introduction 36
4.2 Motivation and Problem Formulation 38
4.3 Proposed Methodology 40
4.3.1 Pre-processing and Grayscale Transformation 41
4.3.2 ResBoot-50: Grayscale-Adapted Residual Backbone 42
4.3.3 Bootstrapped Training and Class-Imbalance Handling 43
4.3.4 Inference and Temporal Aggregation 44
4.3.5 DriveAlertNet Algorithm 45
4.4 Datasets and Pre-Processing 45
4.4.1 StateFarm Distracted Driver Detection (SFDD) 46
4.4.2 Drive&Act Distracted Driver Subset 47
4.4.3 Data Splits and Summary 47
4.5 Training Objective and Optimization 47
4.6 Experimental Setup and Evaluation Metrics 49
4.6.1 Experimental Setup 49
4.6.2 Evaluation Metrics 49
4.7 Experimental Results and Discussion 50
4.7.1 Frame-Level vs. Video-Level Performance 50
4.7.2 Comparison with Conventional CNN/LSTM Baselines 50
4.7.3 Comparison with Recent Methods 51
4.7.4 Error Analysis and Interpretability 52
4.8 Summary 54
V. An Efficient Temporal Attention Framework for Real-Time Detection of Unsafe Driver Behaviors 56
5.1 Introduction 56
5.2 Motivation and Problem Formulation 58
5.3 Proposed Methodology 60
5.3.1 Frame-to-Sequence Construction 61
5.3.2 Attention Temporal Feature Extractor Backbone 62
5.3.3 Attention-Guided Temporal Aggregation 63
5.3.4 Classification, Decision Fusion, and Loss 64
5.3.5 Algorithmic Summary 65
5.4 Datasets and Pre-Processing 66
5.4.1 NTHU Drowsy Driver Detection Dataset (NTHU-DDD) 66
5.4.2 Pre-Processing and Subsequence Generation 66
5.5 Training Objective and Optimization 68
5.6 Experimental Setup and Evaluation Metrics 68
5.6.1 Experimental Setup 68
5.6.2 Evaluation Metrics 69
5.7 Experimental Results and Discussion 70
5.7.1 Comparison with State-of-the-Art on NTHU-DDD 70
5.7.2 Comparison with State-of-the-Art on Drive&Act 70
5.7.3 Training Convergence and Efficiency 71
5.7.4 Comparative Overview with Related Work 73
5.8 Summary 75
VI. Conclusion and Future work 77
6.1 Conclusion 77
6.2 Future work 78
References 79

