검색 상세

유방암 통합데이터베이스 구축을 통한 하위분류별 바이오마커 동정

Identification of Subclass-Specific Biomarkers by Developing an Integrated Database of Breast Cancer

초록/요약

Breast cancer is one of the common diseases to Western female in times past but westernized life style leads to increased incidence of breast cancer in Asian female. Many risk factors like life style, family history and genetic factors can induce breast cancer. Microarray is widely used to measure the genetic factors of breast cancer. Microarray can analyze plenty of genes of multiple samples at once. Consequently many biomarkers that associated with breast cancer are introduced. Even though the results of microarray analysis are promising, the reusability of the results is not well studied yet. Some of the institutes run the web sites that provide meta-analysis using public microarray data to address this issue. The web sites provide the results of meta-analysis, which combine multiple results of individual data with similar hypothesis. They also provide some clinical information that can be utilized in analysis. However, the number of samples in the web sites is still limited. Moreover, they have a limit on providing sufficient clinical information which is necessary to efficient reuse of the microarray data for breast cancer research. Therefore, the purpose of this study is to integrate various microarray data, restructure this data that can be reused for breast cancer research, and provide the results of meta-analysis using this data. For developing an integrated database, we first download the mRNA microarray data with their clinical information that related to breast cancer from three open repositories and develop a structurally well-organized database of breast cancer after in-depth curation of the downloaded data. Tables of the database are organized based on the GEO data explanation forms to build a structural database. After elaborate review about two clinical guidelines of breast cancer, we select variables that are clinically meaningful in order to provide detailed clinical information. We next do meta-analysis with related variables to subclasses of breast cancer. For example, we selected ERBB2 status, which is one of the important markers in breast cancer, and “stage” information that classifies the cancer phase clinically. A Fisher’s p-value combined method is used for meta-analysis. Consequently we identified 52 and 54 differentially expressed genes (DEGs) regarding ERBB2 status and stage, respectively. Network analysis of the DEGs was performed to check the relationship between the DEGs and other genes. And, gene ontology enrichment analysis is performed to identify the biological meanings of the DEGs. As a result, in case of ERBB2 positive class, TPRG1 that induces cancer was over expressed and tumor suppressor genes like MUCL1, CLCA2 and DLK1 were down expressed in ERBB2 negative class. In case of a high stage case, overexpression of cancer metastasis related genes including MMP12 and BCL2A1, CXCL5 are observed. On the other hand, FOS, which known as oncogenes, and TFF2, which is known to induce endocervicitis that closely associated to breast disease, are underexpreseed. To check the classification accuracy of these genes, we use four classification methods including Linear Discriminant Analysis, Random Forest, K-Nearest Neighborhood, and Support Vector Machine. Among them the Random Forest showed the best performance in a 10-fold cross validation scheme. In this study we developed a breast cancer specific database to identify DEGs for specific subclasses of breast cancer and tested the performance of these DEGs using several classification methods. For further study the experimental validation of these DEGs is needed and this database should be investigated for diverse breast cancer studies using the detailed clinical information, Moreover, diverse types of microarray data should be included in the database.

more

목차

TABALE OF CONTENTS
ABSTRACT ··················································································· ⅰ
TABLE OF CONTENS ······································································ ⅳ
LIST OF FIGURES ··········································································· ⅵ
LIST OF TABLES ············································································ ⅶ
ABBREVIATION ············································································ ⅷ
I. INTRODUCTION ··········································································· 1
A. BACKGROUND ········································································ 1
B. RELATED RESEARCHES ···························································· 4
C. PURPOSE OF THIS STUDY ·························································· 6
II. MATERIALS AND METHODS ························································· 8
A. MICROARRAY DATA ································································· 8
B. DEVELOPING INTEGRATED DATABASE ······································· 9
1. DATA DOWNLOAD ·································································· 9
2. DATABASE TABLE ORGANIZATION ·········································· 10
3. RAW DATA PREPROCESSING ··················································· 11
C. CONSTRUCTION OF BREAST CANCER CLINICAL INFORMATION ····· 13
D. META-ANALYSIS USING THIS INTEGRATED BREAST CANCER
DATABASE ·············································································· 13
III. RESULTS ················································································· 16
A. DEVELOPING INTEGRATED DATABASE OF BREAST CANCER ········· 16
B. META-ANALYSIS RESULTS USING THIS DATABASE ······················ 27
C. PERFORMANCE TEST OF META-ANALYSIS RESULTS IN
BREAST CANCER mRNA DATASETS·············································· 34
IV. DISCUSSION ············································································· 38
V. CONCLUSION ············································································ 41
REFERENCES ················································································ 42
국문요약 ······················································································· 45

more