검색 상세

Optimal scoring based mixture modeling for ordinal data

초록/요약

Real data may be mixed data that is a combination of continuous, ordinal, nominal variables. When we start clustering, it is necessary to understand the characteristics of data. In this paper, we conduct clustering according to the latent variable form of the given data. To do so, we will estimate optimal scores for ordinal variables by minimizing the loss function of PCA. Then, with a di erence from the most representative model based clustering methods, Mclust and ClustMD, we propose a new clustering algorithm to overcome their disadvantages. Through numerical study, we compare their perfor- mances in accuracy and computing time when the label is known. Finally, we apply the new method to a real data, Byar data.

more

목차

1 Introduction 1
2 Mixture model based clustering review 2
2.1 Gaussian mixture model 2
2.1.1 Model selection : Mclust 3
2.2 ClustMD 4
2.2.1 Monte Carlo EM algorithm 4
2.2.2 Model selection : ClustMD 4
3 New Method 5
3.1 Optimal scaling 5
3.2 Optimal-Mclust algorithm 6
4 Simulation study 7
5 Data example : Prostate cancer data 10
5.1 Byar data 10
5.2 Result 10
6 Conclusion 13
A Appendix 15

more