검색 상세

Computational prediction of protein folding rate using structural parameters and network centrality measures

초록/요약

A polymer of amino acids undergoes a complex physicochemical process called protein folding in which it tries out multiple conformations in its unfolded state before deciding on a fundamentally distinct native three-dimensional (3D) structure. Several theoretical studies have used a collection of 3D structures, determined various structural characteristics, and examined their correlations with the natural logarithmic protein folding rate (ln(kf)) in order to explain this process. Unfortunately, these structural features are exclusive to a limited group of proteins and do not have the ability to reliably predict ln(kf) for both two-state (TS) and non- two-state (NTS) proteins. A few machine learning (ML)-based models have been presented using smaller training datasets in an attempt to overcome the shortcomings of the statistical methods. Although all of these techniques are promising, none of them provides an effective folding mechanism. Based on newly created datasets, we assessed the predictive power of 10 distinct ML algorithms in this study by utilizing five distinct network centrality measures and eight different structure characteristics. Support vector machine was determined to be the most suitable regressor for predicting ln(kf) in comparison to the other nine regressors, for three different datasets respectively. In addition, combining structural characteristics and network centrality measures enhances prediction performance, suggesting that more than one factor contributes to folding. This thesis aims to advance our understanding of the relationship between protein structure and folding rates, providing valuable insights for both computational biology and experimental studies. The integration of ML techniques with structural and network parameters offers a promising avenue for predicting protein folding rates and contributes to the broader field of bioinformatics.

more

목차

1. Introduction. 11
2. Overview of protein folding and kinetics. 14
2.1. Protein folding. 14
2.2. Structural class of a protein. 16
2.3. Protein folding kinetics. 17
2.4. Protein misfolding and aggregation. 18
3. Overview of machine learning. 20
3.1. Steps involved in machine learning. 20
3.2. Machine learning algorithms. 22
4. Material and methods. 23
4.1. Dataset description and acquisition. 23
4.2. Structural parameter selection. 23
4.2.1. Relative contact order. 23
4.2.2. Absolute contact order. 24
4.2.3. Total contact distance. 24
4.2.4. Chain topology parameter. 24
4.2.5. Fraction of local contact. 24
4.2.6. Long-range order. 25
4.2.7. Long-range contact order. 25
4.3. Network centrality measures. 25
4.4. Evaluation metrics. 27
5. Results and Discussion. 28
5.1. Structural parameters and their relationship with ln(kf). 28
5.2. Network centrality measures and their relationship with ln(kf) of TS and NTS. 30
5.3. Large scale machine learning regression models. 30
5.4. Comparison of SVM-based single model with the ensemble models. 34
5.5. Model interpretation. 35
5.6. Comparison of SVM-based models with the statistical parameters. 36
5.7. Supplementary information. 37
6. CONCLUSION AND FUTURE WORK. 40
7. BIBLIOGRAPHY. 41

more