검색 상세

Network-based machine learning approach for aggregating multi-modal data

Network-based machine learning approach for aggregating multi-modal data

초록/요약

The real-world data are present in the form of multiple modalities, which is called multi-modal data, such as multi-view social media data or multi-omics data. As a single-modal data may have insufficient and noisy information for learning the structure of the data despite of the massive number of data, multiple representations contribute to a better understanding of the data under the complex system that results in improved prediction performance. Especially, multi-omics studies have revealed the distinctive and shared molecular features of cancers to better understand the underlying complex biological mechanism and discover novel biomarkers associated with cancer progression and prognosis. In this respect, aggregating heterogeneous information on multi-modal data has attracted much attention in various fields of machine learning-based studies. Multi-modal data aggregation gathers shared information between different modalities or transforms the multi-modal data into a high-level feature matrix as a new input. Those techniques are useful to provide better insights into heterogeneous data in an integrated view and the transformed data can be used as an input to a prediction model which contributes to an improved predictive power and a better interpretation on the multi-modal data. However, it is challenging due to the data heterogeneity, noise, missing value, and data inconsistency. Multi-modal data are more informative to represent them as a network, as their inter- and intra-relationships between them can be incorporated. In this thesis, we have developed two network-based multi-modal data aggregation methods: multi-view network clustering and multi-layered network-based pathway activity inference method. Then, we demonstrate each method in various experimental studies. Specifically, we applied the former approach to a social-tagged landmark image clustering method and the latter to transform multiple genomic data into a pathway-level data for clinical outcome prediction models in various cancer studies. The experimental results showed that the presented approaches effectively aggregate heterogeneous information that is robust to noise on the data, exploiting the network structure considering interactions across different modalities. Also, they facilitate the integrated network analysis as they represent multi-modal data on the integrated network before aggregating information. As they are generally applicable to any numbers and types of data in various domains, many future studies to an integrated multi-modal data analysis are possible.

more

목차

1. Introduction 1
1.1 Multi-modal data 1
1.2 Network-based multi-modal data aggregation 2
1.3 Thesis outline 3
2. Network-based data aggregation 6
2.1 Network clustering 6
2.1.1 Similarity graph 6
2.1.2 Spectral clustering 8
2.2 Network-based pathway activity inference 9
2.2.1 Pathway activity inference 9
2.2.2 Network-based pathway activity inference 11
3. Multi-view network clustering 12
3.1 Benchmark algorithms 13
3.2 Multi-view network clustering algorithm 14
3.3 Flickr social-tagged landmark image data 16
3.3.1 Social-tag (short text) network 17
3.3.2 Image network 18
3.4 Experimental setting 18
3.5 Performance evaluation 19
3.6 Results 19
3.6.1 Single-view network clustering performance 19
3.6.2 Performance comparison between multi-view network clustering algorithms 20
3.6.3 Multi-view network based cluster analysis 22
3.7 Discussion 24
4. Multi-layered network-based pathway activity inference 26
4.1 Pathway-based multi-layered gene-gene graph 26
4.2 Integrative directed random walk-based pathway activity inference 27
5. Pathway-driven integrative network analysis for a better cancer prognosis 30
5.1 TCGA breast cancer data 31
5.2 Experimental setting 32
5.3 Denoising autoencoder-based feature selection 33
5.4 Performance evaluation and survival prediction 34
5.5 Results 35
5.5.1 Performance comparison on a single type of feature data 35
5.5.2 Performance comparison of the pathway-based prediction methods on combined feature data 37
5.5.3 Identification of significant pathways and genes in breast cancer 39
5.6 Discussion 47
6. Robust predictive model on the integrated pathway-based gene-gene network 48
6.1 Breast cancer and neuroblastoma data 49
6.2 Experimental setting 50
6.3 Rank-based pathways selection and survival prediction 52
6.4 Performance evaluation and robustness test 52
6.5 Results 53
6.5.1 iDRW improves survival prediction performance compared to other pathway-based approaches 53
6.5.2 iDRW identifies cancer-associated pathways and genes 60
6.5.3 The pathways and genes are jointly analyzed in the gene-gene network 63
6.6 Discussion 65
7. Urologic cancer integrative analysis on the multi-layered gene-gene network 67
7.1 TCGA urologic cancer data 69
7.2 Experimental setting 72
7.3 Multiple clinical outcome prediction 73
7.3.1 Lasso-Cox regression model 73
7.3.2 RFE-RFC model 74
7.4 Performance evaluation 75
7.5 Results 76
7.5.1 iDRW contributes to a better cancer survival or metastasis prediction 76
7.5.2 iDRW jointly prioritizes risk pathways and genes on multi-omics data 87
7.5.3 iDRW shows distinctive pathway activity patterns across cancers 95
7.5.4 iDRW facilitates the integrative gene-gene network analysis 99
7.6 Discussion 101
8. Conclusion 102
References 104

more