검색 상세

Development of a novel gene set, HepScope, and a CNN-based model for identifying malignant hepatocytes using multi-omics data

초록/요약

Hepatocellular carcinoma (HCC) is the most common form of liver cancer rising as the one of most lethal cancer worldwide. HCC continues to pose a significant challenge to global health, with molecular diagnostics particularly struggling to accurately identify malignant hepatocytes using single-cell RNA sequencing (scRNA-seq). To address these challenges, this study aimed to develop a novel gene set and an artificial intelligence (AI) tool to address these challenges. This study leverages an expansive dataset encompassing scRNA-seq, spatial transcriptomics, bulk RNA-seq, and comprehensive proteomics data. Through this dataset, I developed the "HepScope" gene set, identifying significantly upregulated genes in malignant hepatocytes compared to their non-malignant counterparts. A 1-dimensional convolutional neural network (1D-CNN) model was applied to the HepScope expression matrix and benchmarked it with 5 public gene sets and 11 AI-based models, respectively. Compared to AI-based models or other gene sets, HepScope-CNN demonstrated superior performance across evaluation metrics. The analysis of the tumor immune environment and biological pathways at the protein level related to the HepScope gene set provides deep insights into the tumor microenvironment specific to HCC. Particularly, cell-cell interaction between immunosuppressive cells and tumor cells in HepScopehigh samples highlights a novel cellular signal shedding lights on the potential therapeutic targets. Collectively, this research contributes to precision medicine in HCC by pinpointing biomarkers specific to malignant hepatocytes, delineating the immune landscape, and pioneering innovative convolutional neural network methodologies. Keywords: Hepatocellular carcinoma, Single cell RNA-sequencing, Convolutional neural network, Tumor microenvironment, Multi-omics data analysis.

more

목차

I. Introduction 1
II. Methodology 4
A. Data description 4
B. Processing of data 5
1. Processing of scRNA-seq data 5
2. Processing of stRNA-seq data 6
3. Processing of bulk RNA-seq data 6
4. Processing of proteomics data 6
C. Establishment of new gene set, HepScope 10
D. Known gene sets to identify HCC 10
E. single sample Gene Set Enrichment Analysis 11
F. Generation of 1D-CNN model based on HepScope expression matrices: HepScope-CNN 12
G. Benchmarking with alternative models 13
H. Performance evaluation 14
I. Tumor Immune Microenvironment analysis 15
J. Cell-cell interaction analysis 15
K. Differentially expressed protein analysis 15
L. Gene ontology and KEGG analysis 16
M. Protein-protein interaction analysis 16
III. Results 17
A. Lack of existing tools to identify malignant hepatocytes at the single-cell level 17
B. Development of novel biomarker discriminating malignant hepatocytes at the single cell level 19
C. Validation of capability of HepScope geneset in multi-omic data 25
D. Development of novel AI model, HepScope-CNN 27
E. Benchmarking of HepScope-CNN with other AI-based models 29
F. Investigation of tumor immune microenvironment in HCC using HepScope 32
G. Exploration of role of HepScope protein upregulated in HCC 36
IV. Discussion 44
References 48

more