검색 상세

Scarce biomedical sample exploitation approach for multimodal time series data integration

Scarce biomedical sample exploitation approach for multimodal time series data integration

초록/요약

Recent technological advances enable to collect a variety of knowledge and heterogeneous data from multiple domain. As various types of data including prior knowledge and multimodality are generated, numerous methods to integrate such dataset have been developed to extract complementary knowledge from multiple domain. However, integrating prior knowledge and multimodal data is challenging in four aspects: small sample size problem (P1), sequential data processing (P2), irregularity of heterogenous data (P3), and model interpretability (P4). In this thesis, we suggest two sample exploitation methods for incorporating multimodal data resolving four aspects of knowledge and data integration issue. In the first study, we especially focus on small sample size problem (P1) for multimodal data integration in the field of bioinformatics where available sample size is extremely small. The suggested model is intrinsically able to integrate irregular multimodal data (P3) while recognizing subtype-sensitive genes (P4). Subsequently, we expand our study to time series data with multimodality (P2, P3) using sample exploitation approach (P1) while model interpretability (P4) is kept. Across two studies sample exploitations are performed via kernel-reweighting and separate learning phase, respectively. The suggested methods are validated using 4 experiments. For the first study, L1-regularized kernel-reweighting regression model is used for inferring subtype-specific patterns between gene expression and DNA methylation. Subsequent experiments include simulation study, predicting Alzheimer’s disease progression of patients in mild cognitive impairment, and analyzing genomic variation affecting AD progression.

more

목차

1. Introduction 1
1.1 Overview 1
1.2 Summary of contributions 4
2. Background 6
2.1 Sequential data analysis 6
2.1.1 Recurrent neural network 6
2.1.2 Gated recurrent units 9
2.2 Meta-dimensional data integration 11
2.2.1 Concatenation-based integration 11
2.2.2 Transformation-based integration 11
2.3 Interpretability of machine learning 13
2.3.1 Intrinsic interpretable model 13
2.3.2 Model-agnostic interpretation 13
3. Exploiting samples based on prior knowledge integration 15
3.1 Introduction 15
3.2 L1-regularized linear regression 18
3.3 Kernel-reweighting lasso 19
3.4 Inferring subtype-specific network 21
3.4.1 Dataset 22
3.4.2 Predicting gene expression level based on DNA methylation 23
3.4.3 Subtype-specific prediction performance 28
3.4.4 Subtype-specific association network 30
3.5 Discussion 33
4. Multimodal time series data integration framework 37
4.1 Introduction 37
4.2 Multimodal longitudinal data integration framework 38
4.3 Experiment: Simulation study 40
4.4 Experiment: Predicting AD progression using ADNI data 43
4.4.1 Study participants 44
4.4.2 Experimental setting 45
4.4.3 Comparison of prediction of MCI to AD conversion using cross-sectional data at baseline and longitudinal data 49
4.4.4 Comparison of prediction of MCI to AD conversion using single modal and multimodal data 53
4.5 Experiment: genomic variations in Alzheimer’s disease 54
4.5.1 Methods for integrating and interpreting WGS data 54
4.5.2 Performance improvement 55
4.5.3 Model interpretation 57
4.5.4 Functional interpretation of genetic variants 59
4.6 Discussion 67
5. Conclusion 70

more