검색 상세

EnsemPred-ACP: Combining machine and deep learning to improve anticancer peptide prediction

초록/요약

Anticancer peptides (ACP) have gained recognition as promising therapeutic candidates due to their capacity to specifically eliminate malignant cells while preserving the integrity of normal tissues. Nevertheless, achieving reliable computational identification of ACPs continues to pose significant challenges, primarily due to the intricate molecular processes involved in cancer biology. This research presents EnsemPred-ACP, a novel ensemble-based strategy that integrates machine learning (ML) and deep learning (DL) methodologies to improve ACP prediction accuracy. The key contribution of our work lies in incorporating binary profile features (BPF) as a complement to existing protein embeddings, enabling the capture of position-dependent characteristics essential for ACP recognition. The system employs a two-stage pipeline design: ML algorithms process manually engineered sequence attributes and embeddings, while DL networks utilize BPF-enriched embeddings. When tested on independent validation sets, EnsemPred-ACP attained an accuracy of 0.863, sensitivity of 0.897, and specificity of 0.830, surpassing the performance of current state-of-the-art approaches. The framework exhibited robust generalization capabilities, with an area under the receiver operating characteristic curve reaching 0.93. Ablation experiments conducted on separate datasets underscored the significant contribution of BPF, leading to prediction accuracy improvements of 2.5% and 11.1% when combined with ESM2 and ProtT5 embeddings, respectively. These findings validate the efficacy of our integrated methodology for precise identification of candidate therapeutic peptides, thus advancing the field of peptide-based cancer treatment strategies.

more

목차

1. Introduction 1
2. Materials and Methods 5
2.1. Dataset construction 5
2.2. Feature engineering 7
2.3. Protein embedding 9
2.4. Model development pipeline 10
2.5. Ensemble strategy 11
2.6. Evaluation metrics 12
3. Results 14
3.1. Evaluation of handcrafted features across ML models 14
3.2. Performance evaluation across ML and DL models and datasets 16
3.3. Confusion Matrices and AUC 18
3.4. Ablation Study: The Role of Binary Profile Features in Protein Embeddings 21
3.4.1 Experimental design 21
3.4.2 Performance enhancement through BPF integration 21
3.4.3 Sequence length dependency 26
3.4.4 Amino acid composition effects 28
3.4.5 Biological mechanism of BPF enhancement 30
3.4.6 Conclusions and future perspectives 31
3.5. Performance comparison between EnsemPred-ACP and publicly available ACP predictors in the independent dataset 32
3.6. SHAP analysis reveals biologically relevant sequence patterns 35
4. Discussion 37
CONCLUSION 39
REFERENCES 40
Appendix Figures 44

more