검색 상세

On-policy Deep Reinforcement Learning for HPC Job Scheduling: Enhancing Performance Stability through Dynamic Data Selection

초록/요약

Job scheduling in High-Performance Computing (HPC) systems is a crucial task that determines the allocation of computational resources. Traditional heuristic algorithms often fail to fully capture the complexity of job scheduling. Reinforcement learning (RL) offers promising advancements. However, the performance of on-policy RL algorithms can be significantly influenced by the job data, leading to variability in performance. To enhance performance stability, we propose a novel dynamic data selection method. We predict the reward value using a tree-based machine learning model and select the data based on this prediction. This unique data selection process refines the input to the RL algorithm, improving performance stability. Furthermore, we introduce a self-attention-based on-policy network for job scheduling in HPC systems. This network more effectively utilizes the selected data when formulating policies. We validate our proposed method through experiments based on real-world job log data from HPC systems, comparing its performance with other heuristic scheduling algorithms. The results confirm the effectiveness of our approach in enhancing performance stability across real-world workloads and improving the overall performance of on- policy RL algorithm.

more

목차

Ⅰ INTRODUCTION 1
Ⅱ RELATED WORKS 4
1. HPC Job Scheduling 4
2. Reinforcement Learning-based Job Scheduling 5
3. Data Selection for Reinforcement Learning 6
Ⅲ BACKGROUND 8
1. Overview of Reinforcement Learning 8
2. Off-Policy and On-Policy Reinforcement Learning 8
3. Proximal Policy Optimization 10
4. Self-Attention Mechanism 11
Ⅳ DYNAMIC DATA SELECTION WITH DEEP REINFORCEMENT LEARNING AGENT 12
1. Dynamic Data Selection 13
2. The complexity of the DS 16
3. Self-Attention-based Actor-Critic Network 16
4. Data Selection and Self-Attention Actor-Critic Network Algorithm 19
Ⅴ EXPERIMENTS 21
A. Experiments Setup 21
1. HPC job data 21
2. Compared Algorithms 22
3. DS-DRL Evaluation 23
4. Evaluation Metrics 24
B. Experimental results and analytics 26
1. Evaluation of Dynamic Data Selection in Reward Prediction 26
2. Impact of Data Selection Method on System Performance 27
3. Comparative Analysis of Scheduling Algorithms on Average Bounded Slowdown 30
4. Comparative Analysis of Scheduling Algorithms on Waiting Time 35
5. Comparative Evaluation with other real-world datasets 35
Ⅵ CONCLUSION 38
REFERENCE 39

more