검색 상세

Investigating Genomic Associations by Fusing Regression Methods on Cancer Profiles

초록/요약

Cancer is eventually the result of cells that uncontrollably grow and do not die. Normal cells in the body follow an orderly path of growth, division, and death. When this process breaks down, cancer begins to form due to the mass abnormal cell growth. The ongoing study of gene expression with respect to multi layered genomic features is highly useful to overcome poor prognosis of cancer. Association analysis of gene expression traits with genomic features is crucial to identify the molecular mechanisms underlying cancer. Simple correlation based association tests are prone to identify more indirect genomic associations. In this study, sparse regression methods GFLasso, Lasso, SGL and SIOL were employed to discover genomic associations. The purpose of this study is to understand all pros and cons of sparse regression, structural information and grouping effects, to identify the significant cancer causing genomic associations, genomic features and expression traits. An extensive study is carried out and compared the results obtained by each regression method. The performance is analyzed for each regression method in terms of mean squared error, non-zero beta densities, computational time, etc. Association study between gene expressions and a genomic feature (methylation) was done using the regression coefficients obtained by each computational method. The study was carried out by analyzing the association pairs, strong influencing predicators (methylation features) and output variants (mRNA) of each method, on various cancer profiles, ? By combining the results of all regression types and fusing the results using similarity measurement i.e., similarity network fusion (SNF). The overall motivation is to suppress noise, but still consider the weaker genomic associations that are true positives for the study, though identifying stronger genomic associations is equally important. SNF is used for this study for fusing, as fused network captures both shared and complementary information from different data sources, using propagation effects on multiple iterations.

more

목차

TABLE OF CONTENTS
Master Dissertation i
ACKNOWLEDGEMENTS i
ABSTRACT ii
1 Introduction 1
2 Summary Of Work 4
3 Materials and Methods 6
3.1 Data & Preprocessing 6
3.2 Methods 8
3.2.1 Least absolute shrinkage and selection operator (Lasso) 8
3.2.2 Graph Guided Fused Lasso (GFLasso) 9
3.2.3 Sparse Group Lasso (SGL) 11
3.2.4 Structured Input-Output Lasso (SIOL) 13
3.3 Fusion Method ? SNF 16
4 Results 20
4.1 Comparison Of Regression Methods 20
4.1.1 Identifying the performance of all four regression methods in terms of MSE and Density 20
4.1.2 Discovering common genomic features of all methods 24
4.2 Integrative Regression Network 28
4.2.1 Investigating combined benefits of all regression methods using similarity measurement 29
4.2.2 Genomic association network construction and study 34
4.3 Functional characterization of the affected genes using the tool DAVID 38
5 Discussion & Conclusion 41
6 Future Work 43
REFERENCES 44

more