Budgeted Token–Channel Gating for Early Alzheimer’s Diagnosis
- 주제(키워드) multimodal
- 주제(DDC) 006.31
- 발행기관 아주대학교 일반대학원
- 지도교수 Jongbin Ryu
- 발행년도 2026
- 학위수여년월 2026. 2
- 학위명 석사
- 학과 및 전공 일반대학원 인공지능학과
- 실제URI http://www.dcollection.net/handler/ajou/000000035842
- 본문언어 영어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
초록/요약
Early detection of Alzheimer’s disease (AD) before irreversible brain damage occurs is crucial for timely treatment and intervention. However, existing studies largely focus on contrasting classifications, such as AD versus cognitively normal (CN), and struggle to capture subtle transitions from CN to early mild cognitive impairment (EMCI) and late mild cognitive impairment (LMCI). Moreover, multimodal methods that simply concatenate structural MRI (sMRI) and clinical biomarkers, rely on global attention, or apply token-wise or channel-wise gating tend to diffuse inter-modal information exchange and fail to model fine-grained interactions in the early stages of the disease. We propose a Token–Channel Gated Cross-Attention (TCGA) framework for multimodal fusion of 3D sMRI and clinical biomarkers, where a Vision Transformer–based encoder extracts structural image patterns and a clinical tokenizer embeds each biomarker as a separate token to preserve variable-specific semantics. Within TCGA, a token gate selects informative sMRI patch–clinical token pairs for each query, and a channel gate retains meaningful embedding dimensions for the selected pairs, suppressing redundant interactions. We further introduce a Budget Gate Loss that penalizes deviations from a predefined sparsity level, encouraging the model to allocate a limited interaction budget to a small set of clinically meaningful patch–biomarker pairs. On the ADNI cohort spanning CN, EMCI, LMCI, and AD, the proposed TCGA framework outperforms multimodal baselines on early diagnosis tasks, indicating that budget-controlled dual token–channel gating is effective for capturing subtle, clinically relevant early changes in AD progression. keyword : Alzheimer's disease, Multimodal deep learning, Cross-modal attention, Medical imaging, Clinical biomarkers
more목차
1. Introduction 1
2. Related works 4
2.1 Traditional Machine Learning and Early Deep Models 4
2.2 Single-Modality Models for AD Diagnosis 5
2.3 Multimodal Fusion for AD Diagnosis 5
2.4 Distinctive Aspects of the Proposed Framework 6
3. Method 8
3.1 Overview of the proposed framework 8
1) Image encoder 9
2) Clinical Tokenizer 9
3) Token–Channel Gated Bidirectional Cross Attention 10
4) Token-wise Weighted Aggregation and Classifier 11
5) Objective Function 12
4. Experiments 13
4.1 Dataset Description 13
4.2 Pre-processing 13
4.3 Experimental settings 15
4.4 Performance Comparison 16
4.5 Results 20
4.6 Ablation Study 21
5. Discussion 25
5.1 Summary of Contributions 25
5.2 Clinical Implications 25
5.3 Grad-CAM Visualization & Analysis 25
6. Conclusion 29
References 30

