Robust and Adaptive Log Anomaly Detection: Multi-Level Optimization Framework
- 주제(키워드) Contrastive Learning , Cross-Domain Adaptation , Dataset Pruning , Deduplication , Distributional Drift , Few-Shot Learning , Log Anomaly Detection , Small Language Models (SLMs) , Zero-Shot Learning.
- 주제(DDC) 004.6
- 발행기관 아주대학교 일반대학원
- 지도교수 Jin Kwak
- 발행년도 2026
- 학위수여년월 2026. 2
- 학위명 박사
- 학과 및 전공 일반대학원 AI융합네트워크학과
- 실제URI http://www.dcollection.net/handler/ajou/000000036077
- 본문언어 영어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
목차
Chapter 1. Introduction 1
1.1. Purpose of the Study 1
1.2. Problem Statement 3
1.3. Research Questions 3
1.4. Key Contributions 5
1.5. Thesis Outline 6
Chapter 2. Background and Related Work 8
2.1. Classical Neural Sequence Modeling 10
2.2. Semantic Representations using Small Language Models 12
2.3. Domain Adaptation and Generalization 14
2.4. Log Dataset Reduction 15
2.5. Synthesis and Thesis Advances 18
Chapter 3. Semantic-Aware Robust Log Anomaly Detection (SaRLog) 20
3.1. Introduction 20
3.2. Significance of Domain-Specific Language Models 22
3.3. Method 27
3.3.1. Log Preprocessing and Tokenization 27
3.3.2. Context-Aware Semantic Representation 29
3.3.3. Siamese Metric Head and Contrastive Objective 30
3.4. Experiment Setting 31
3.4.1. Pair Construction and Model Training 31
3.4.2. Inference and Anomaly Scoring 32
3.4.3. Training Setup and Dataset 33
3.5. Empirical Results 34
3.5.1. In-context Detection 34
3.5.2. Few-Shot Cross-context Detection 36
3.5.3. Zero-Shot Cross-context Detection 37
3.5.4. Impact of Embedding Architecture Choice on Detection Performance 39
3.5.5. Detection Head Embedding Performance 41
3.5.6. Impact of Detection Head Choice on Training Convergence. 43
3.6. Discussion and Conclusion 45
Chapter 4. Temporal Decay Loss for Adaptive Log Anomaly Detection 46
4.1. Introduction 46
4.2. Problem Formulation and Design Criteria 48
4.2.1. Training Objective 48
4.2.2. Design criteria 49
4.3. Method 50
4.3.1. Log Preprocessing 50
4.3.2. Context-aware semantic embedding 52
4.3.3. Classification head 53
4.3.4. Loss with Decaying Factor (LDF) 54
4.4. Experimental Settings and Datasets 55
4.5. Results 56
4.5.1. In-context Detection 56
4.5.2. Cross-context Zero-shot Detection 58
4.5.3. Effect of the time-decay parameter a 60
4.5.4. Effect of the Representation models 61
4.5.5 Discussions and Conclusion 62
Chapter 5. Structure-Preserving Semantic log Dataset Consolidation 64
5.1. Introduction 64
5.2. Problem Formulation and Design Objectives 67
5.3. Method 68
5.3.1. Pre-sanitization 69
5.3.2. Context-aware Semantic Representation 70
5.3.3. Semantic Consolidation 71
5.3.4. Minority Events Guardrails 73
5.3.5. Structure-Preservation Diagnostics 74
5.3.6. Hyperparameter Selection 76
5.4. Experiments 76
5.4.1. Task and Metrics 76
5.4.2. Consolidation Pipeline 77
5.4.3. Training and Detection Protocols 78
5.5. Results 78
5.5.1. Strict Near-duplicate Consolidation Policy 78
5.5.2. Semantic Consolidation with Density-based Exemplar Retention Policy 79
5.5.3. Structural Drift Analysis 80
5.5.4. Impact of Consolidation on In-context Detection 83
5.5.5. Impact of Consolidation on Cross-context Zero-shot Detection 85
5.6. Discussion and Conclusion 86
Chapter 6. Synthesis and Integration Framework 89
6.1. Introduction 89
6.2. Empirical Synthesis Across Settings 90
6.2.1. Few-shot cross-context adaptation 90
6.2.2. Zero-shot cross-context generalization 90
6.3. Integrated Pipeline 92
6.4. Design Guidance and Operating Points 95
Chapter 7. Conclusion 99
Bibliography 101
Appendix 114

