검색 상세

Improvement of the KTDA Algorithm for the Visualization of Semantic Network

초록/요약

Textual data differs in the analysis method depending on its domain or various characteristics. The Korean Text Data Analysis Algorithm was presented to provide a pipeline for statistical analysis of Korean text for the above reasons. However, in the process of dimension reduction and correlation cutting, a cutoff setting with insufficient statistical inference was accompanied. The dense visualization result also weaken the interpretabiltiy of the plot. To improve the algorithm, this study presented statistical inference for word-to-word relationships using FDR(False Discovery Rate) control and improved dimension reduction and visualization by applying sparsity cutoff setting and LDA(Latent Dirichlet Allocation). New algorithm is expected to improve the reliability and interpretation of the results of analysis.

more

목차

1. Introduction 1
2. KTDA Algorithm 3
2.1 TF-IDF 4
2.2 Semantic Network 5
2.3 Limitation of KTDA Algorithm 6
3. KTDA-N Algorithm 7
3.1 Sparsity Cutoff Setting 7
3.2 FDR Controlling in Multiple Testing of Correlations 7
3.3 LDA Topic Modeling 9
3.4 Handling Isolates 12
3.5 KTDA-N Algorithm 13
4. Analysis 14
4.1 Thyroid Cancer Data 15
4.2 Lack of Nurse Data 20
5. Conclusion 23
6. References 24
Appendix 27

more