검색 상세

Multi-Omics Integration for Survival Prediction using Topology Information of Pathway

Multi-Omics Integration for Survival Prediction using Topology Information of Pathway

초록/요약

Recent advances of biotechnology enable us to measure quantitative expression of biomolecules in high-throughput manner and information about characterization and quantification of the biomolecules is called as an omics data. Many researchers have been focused on not only analysis of an omics data but also integrative analysis of multi-omics data to understand biological mechanism with a macroscopic perspective. Although previous integration methods showed improved performance of clinical outcome prediction, almost they are statistical based methods which are hard to reflect biological meaning. To overcome those problems, several methods for pathway activity inference using an omics data were suggested, which pathway represents a biological function. However, pathway level analysis with multi-omics data has rarely been studied. In this thesis, we proposed an integrative directed random walk (iDRW) method incorporating multi-omics data to pathway information to show robust performance of survival prediction and find undiscovered cancer-related biomarkers. RNA-Seq, DNA methylation, and the Reverse Phase Protein Array (RPPA) were used to calculate pathway activity score so that it was utilized for survival prediction on breast cancer and identification of risk pathways. Furthermore, combinatorial experiments were conducted with various omics combinations to check the effect of each omics data on accuracy of survival prediction. The results of our study showed that the iDRW method outperforms the previous method which used a single omics data to calculate pathway activity score and identified reasonable breast cancer-related pathways which are contained both well-known and undiscovered risk pathway on breast cancer. Especially, we successfully incorporated the RPPA data with other omics data under our framework despite the RPPA data has many challenges for integration. We observed that the RPPA data is an efficient resource and the combination of RNA-Seq and RPPA data showed the best accuracy in survival prediction on breast cancer. These findings highlighted that using a proper combination of omics data depending on a problem we want to solve is more important than using the more data.

more

목차

I. Introduction ....................................................................................................................................... 1
A. Overview ........................................................................................................................................ 1
B. Summary of contributions .......................................................................................................... 2
II. Background ...................................................................................................................................... 3
A. Biological features: gene, protein and pathway .................................................................... 3
B. Characteristics of high-throughput data: RNA-Seq, DNA methylation and the Reverse Phase Protein Array (RPPA) ........................................................................................... 3
C. Statistical methodology: survival prediction .......................................................................... 4
III. Methods ........................................................................................................................................... 5
A. Data set ........................................................................................................................................... 5
i. Omics data: RNA-Seq, DNA methylation and RPPA .................................................. 5
ii. KEGG pathway for constructing a pathway network ............................................... 7
B. Our proposed framework: integrative Directed Random Walk (iDRW) ......................... 7
i. Overview of iDRW framework .......................................................................................... 7
ii. Unified pathway network construction ........................................................................ 8
iii. Topological incorporation of multi-omics data under iDRW ................................. 9
C. Combinatorial experiments with varying omics data ......................................................... 10
D. Network propagation for robust performance ..................................................................... 10
E. Evaluation .................................................................................................................................... 11
i. Survival classification ........................................................................................................ 11
iv
ii. Risk pathway identification .......................................................................................... 11
IV. Results ............................................................................................................................................ 13
A. Integration model from iDRW framework ........................................................................... 13
B. Predictive power of different combination of omics type when inferring pathway activity score ..................................................................................................................................... 14
C. Effect of performance stability by network propagation ................................................... 15
D. Performance comparison of proposed model ...................................................................... 16
E. Risk pathway identification ...................................................................................................... 17
V. Discussion ........................................................................................................................................ 20
VI. Conclusion ..................................................................................................................................... 22
VII. References ................................................................................................................................... 23

more