Optimal scoring based mixture modeling for ordinal data
- 주제(키워드) ClustMD , Mclust , Mixture model , Optimal scaling , Ordinal data
- 주제(DDC) 510
- 발행기관 아주대학교
- 지도교수 안수현
- 발행년도 2022
- 학위수여년월 2022. 2
- 학위명 석사
- 학과 및 전공 일반대학원 수학과
- 실제URI http://www.dcollection.net/handler/ajou/000000031849
- 본문언어 영어
- 저작권 아주대학교 논문은 저작권에 의해 보호받습니다.
초록/요약
Real data may be mixed data that is a combination of continuous, ordinal, nominal variables. When we start clustering, it is necessary to understand the characteristics of data. In this paper, we conduct clustering according to the latent variable form of the given data. To do so, we will estimate optimal scores for ordinal variables by minimizing the loss function of PCA. Then, with a dierence from the most representative model based clustering methods, Mclust and ClustMD, we propose a new clustering algorithm to overcome their disadvantages. Through numerical study, we compare their perfor- mances in accuracy and computing time when the label is known. Finally, we apply the new method to a real data, Byar data.
more목차
1 Introduction 1
2 Mixture model based clustering review 2
2.1 Gaussian mixture model 2
2.1.1 Model selection : Mclust 3
2.2 ClustMD 4
2.2.1 Monte Carlo EM algorithm 4
2.2.2 Model selection : ClustMD 4
3 New Method 5
3.1 Optimal scaling 5
3.2 Optimal-Mclust algorithm 6
4 Simulation study 7
5 Data example : Prostate cancer data 10
5.1 Byar data 10
5.2 Result 10
6 Conclusion 13
A Appendix 15