dCollection 디지털 학술정보 유통시스템

Scarce biomedical sample exploitation approach for multimodal time series data integration

원문보기

주제(키워드) Incomplete modality , sample exploitation , Multimodal time series data integration , Kernel-reweighting regression , Gated recurrent units , Alzheimer’s disease , Breast cancer
발행기관 아주대학교
지도교수 손경아
발행년도 2020
학위수여년월 2020. 2
학위명 박사
학과 및 전공 일반대학원 컴퓨터공학과
실제URI http://www.dcollection.net/handler/ajou/000000029781
본문언어 영어
저작권 아주대학교 논문은 저작권에 의해 보호받습니다.

초록/요약

Recent technological advances enable to collect a variety of knowledge and heterogeneous data from multiple domain. As various types of data including prior knowledge and multimodality are generated, numerous methods to integrate such dataset have been developed to extract complementary knowledge from multiple domain. However, integrating prior knowledge and multimodal data is challenging in four aspects: small sample size problem (P1), sequential data processing (P2), irregularity of heterogenous data (P3), and model interpretability (P4). In this thesis, we suggest two sample exploitation methods for incorporating multimodal data resolving four aspects of knowledge and data integration issue. In the first study, we especially focus on small sample size problem (P1) for multimodal data integration in the field of bioinformatics where available sample size is extremely small. The suggested model is intrinsically able to integrate irregular multimodal data (P3) while recognizing subtype-sensitive genes (P4). Subsequently, we expand our study to time series data with multimodality (P2, P3) using sample exploitation approach (P1) while model interpretability (P4) is kept. Across two studies sample exploitations are performed via kernel-reweighting and separate learning phase, respectively. The suggested methods are validated using 4 experiments. For the first study, L1-regularized kernel-reweighting regression model is used for inferring subtype-specific patterns between gene expression and DNA methylation. Subsequent experiments include simulation study, predicting Alzheimer’s disease progression of patients in mild cognitive impairment, and analyzing genomic variation affecting AD progression.

1. Introduction 1
1.1 Overview 1
1.2 Summary of contributions 4
2. Background 6
2.1 Sequential data analysis 6
2.1.1 Recurrent neural network 6
2.1.2 Gated recurrent units 9
2.2 Meta-dimensional data integration 11
2.2.1 Concatenation-based integration 11
2.2.2 Transformation-based integration 11
2.3 Interpretability of machine learning 13
2.3.1 Intrinsic interpretable model 13
2.3.2 Model-agnostic interpretation 13
3. Exploiting samples based on prior knowledge integration 15
3.1 Introduction 15
3.2 L1-regularized linear regression 18
3.3 Kernel-reweighting lasso 19
3.4 Inferring subtype-specific network 21
3.4.1 Dataset 22
3.4.2 Predicting gene expression level based on DNA methylation 23
3.4.3 Subtype-specific prediction performance 28
3.4.4 Subtype-specific association network 30
3.5 Discussion 33
4. Multimodal time series data integration framework 37
4.1 Introduction 37
4.2 Multimodal longitudinal data integration framework 38
4.3 Experiment: Simulation study 40
4.4 Experiment: Predicting AD progression using ADNI data 43
4.4.1 Study participants 44
4.4.2 Experimental setting 45
4.4.3 Comparison of prediction of MCI to AD conversion using cross-sectional data at baseline and longitudinal data 49
4.4.4 Comparison of prediction of MCI to AD conversion using single modal and multimodal data 53
4.5 Experiment: genomic variations in Alzheimer’s disease 54
4.5.1 Methods for integrating and interpreting WGS data 54
4.5.2 Performance improvement 55
4.5.3 Model interpretation 57
4.5.4 Functional interpretation of genetic variants 59
4.6 Discussion 67
5. Conclusion 70

반출 Meta View 목록

아주대학교

검색 상세

Scarce biomedical sample exploitation approach for multimodal time series data integration

초록/요약

목차