dCollection 디지털 학술정보 유통시스템

Computational prediction of protein folding rate using structural parameters and network centrality measures

원문보기

주제(키워드) two-state protein , non-two-state protein , protein folding rate , machine learning , support vector machine
주제(DDC) 547
발행기관 아주대학교 일반대학원
지도교수 Gwang Lee
발행년도 2024
학위수여년월 2024. 2
학위명 박사
학과 및 전공 일반대학원 분자과학기술학과
실제URI http://www.dcollection.net/handler/ajou/000000033532
본문언어 영어
저작권 아주대학교 논문은 저작권에 의해 보호받습니다.

초록/요약

A polymer of amino acids undergoes a complex physicochemical process called protein folding in which it tries out multiple conformations in its unfolded state before deciding on a fundamentally distinct native three-dimensional (3D) structure. Several theoretical studies have used a collection of 3D structures, determined various structural characteristics, and examined their correlations with the natural logarithmic protein folding rate (ln(kf)) in order to explain this process. Unfortunately, these structural features are exclusive to a limited group of proteins and do not have the ability to reliably predict ln(kf) for both two-state (TS) and non- two-state (NTS) proteins. A few machine learning (ML)-based models have been presented using smaller training datasets in an attempt to overcome the shortcomings of the statistical methods. Although all of these techniques are promising, none of them provides an effective folding mechanism. Based on newly created datasets, we assessed the predictive power of 10 distinct ML algorithms in this study by utilizing five distinct network centrality measures and eight different structure characteristics. Support vector machine was determined to be the most suitable regressor for predicting ln(kf) in comparison to the other nine regressors, for three different datasets respectively. In addition, combining structural characteristics and network centrality measures enhances prediction performance, suggesting that more than one factor contributes to folding. This thesis aims to advance our understanding of the relationship between protein structure and folding rates, providing valuable insights for both computational biology and experimental studies. The integration of ML techniques with structural and network parameters offers a promising avenue for predicting protein folding rates and contributes to the broader field of bioinformatics.

1. Introduction. 11
2. Overview of protein folding and kinetics. 14
2.1. Protein folding. 14
2.2. Structural class of a protein. 16
2.3. Protein folding kinetics. 17
2.4. Protein misfolding and aggregation. 18
3. Overview of machine learning. 20
3.1. Steps involved in machine learning. 20
3.2. Machine learning algorithms. 22
4. Material and methods. 23
4.1. Dataset description and acquisition. 23
4.2. Structural parameter selection. 23
4.2.1. Relative contact order. 23
4.2.2. Absolute contact order. 24
4.2.3. Total contact distance. 24
4.2.4. Chain topology parameter. 24
4.2.5. Fraction of local contact. 24
4.2.6. Long-range order. 25
4.2.7. Long-range contact order. 25
4.3. Network centrality measures. 25
4.4. Evaluation metrics. 27
5. Results and Discussion. 28
5.1. Structural parameters and their relationship with ln(kf). 28
5.2. Network centrality measures and their relationship with ln(kf) of TS and NTS. 30
5.3. Large scale machine learning regression models. 30
5.4. Comparison of SVM-based single model with the ensemble models. 34
5.5. Model interpretation. 35
5.6. Comparison of SVM-based models with the statistical parameters. 36
5.7. Supplementary information. 37
6. CONCLUSION AND FUTURE WORK. 40
7. BIBLIOGRAPHY. 41

반출 Meta View 목록

검색 상세

Computational prediction of protein folding rate using structural parameters and network centrality measures

초록/요약

목차