Regularization Methods for Mitigating Catastrophic Forgetting in Multi-Task Continual Learning
- 주제(키워드) Neural Network , Deep Learning , Artificial Intelligence , Continual Learning , Incremental Learning , Life-long Learning , Regularization
- 주제(DDC) 658.5
- 발행기관 아주대학교 일반대학원
- 지도교수 신현정
- 발행년도 2026
- 학위수여년월 2026. 2
- 학위명 박사
- 학과 및 전공 일반대학원 산업공학과
- 실제URI http://www.dcollection.net/handler/ajou/000000035439
- 본문언어 영어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
초록/요약
Recent research on artificial intelligence systems has been exploring ways to enable models to continuously learn, adapt, and evolve to changing tasks instead of being limited to one task. For this purpose, continual learning has been actively studied, which enables the model to learn many tasks. A major concern of continual learning is to overcome catastrophic forgetting, the phenomenon in which a model forgets tasks it has previously learned. Diverse approaches have been attempted to solve this difficulty: Adding some samples from previous tasks, adding nodes or layers for incoming tasks, or regularizing the learning on the new task to preserve the optimal parameters of the previous task. The last approach maintains the initial structure of the model and does not necessarily require data sampling or structural expansion of the model. This provides advantages over issues such as data storage and management problems, increased model complexity and the related problems. However, there are some inevitable difficulties because the number of parameters is limited. First, there is still catastrophic forgetting due to parameter updates or overwrites as one task proceeds to another. Second, the performance of previous tasks unavoidable degrades as the model continues to learn and adapt to new tasks. Third, given a set of tasks, if the model parameters are not well distributed across tasks, it cannot learn about the new tasks. To overcome these difficulties, we propose a method named as task-wise winner-taking continual learning (TWC) which improves the existing representative regularization method, elastic weight consolidation (EWC). TWC regularizes parameters task-wisely, recovers degeneration of performance, and efficiently distributes parameters to tasks. In this paper, we present experimental results showing TWC performs better compared to the up-to-date methods in various situations. TWC was also integrated into a Continual Autoencoder (CAE) and applied to temporally ordered SARS-CoV-2 genomic data. The TWC-based CAE maintained reconstruction fidelity and preserved coherent latent structures across multiple evolutionary stages, effectively modeling variant progression without representational drift.
more목차
1 Introduction 1
2 Theoretical Background 7
2.1 Fundamentals of Continual Learning 8
2.2 Approaches to Catastrophic Forgetting 8
2.2.1 Replay-based Methods 9
2.2.2 Parameter Isolation Methods 10
2.2.3 Regularization-based Methods 10
2.3 Elastic Weight Consolidation and its limitations 11
2.3.1 Principle 11
2.3.2 Fisher Information Matrix 13
2.3.3 Advantages and Empirical Success 13
2.3.4 Limitations of EWC 13
3 Proposed Method: Task-wise Winner-Taking Continual Learning 17
3.1 Overview 18
3.2 Task-wise Weight Consolidation 19
3.3 Continual Degeneration Recovery 21
3.4 Winner-Taking Regularization 22
4 Experimental Validation on Bench mark Datasets 27
4.1 EWC vs. Suggesting Methods 28
4.2 Existing Models vs. Suggesting Methods 32
4.3 Discussion on Findings 63
4.4 Summary 66
5 Extended Experiments: Continual Autoencoder with TWC 68
5.1 Continual Autoencoder with TWC 69
5.2 Experimental Framework 72
5.3 Results and Analysis 75
5.3.1 Reconstruction Performance 75
5.3.2 Latent Representation Drift 77
5.3.3 Latent Space Visualization 78
5.3.4 Comparative Analysis 79
5.4 Discussion 83
6 Application Study: COVID-19 Variant Analysis 86
6.1 COVID-19 Variant Analysis 87
6.2 Dataset Construction and Preprocessing 89
6.3 Results and Analysis 92
6.3.1 Reconstruction Performance 92
6.3.2 Latent Representation Analysis 93
6.3.3 Representation Drift and Stability 95
6.3.4 Summary of Findings 96
6.4 Discussion and Implications 98
7 Conclusion 102
References 106

