Topic Model Validation Methods and their Impact on Model Selection and Evaluation
- Autor(en)
- Jana Bernhard, Martin Teuffenbach, Hajo G. Boomgaarden
- Abstrakt
Topic Modeling is currently one of the most widely employed unsupervised text-as-data techniques in the field of communication science. While researchers increasingly recognize the importance of validating topic models and given the prevalence of discussions of inadequate validation practices in the literature, there is limited understanding of the consequences of employing different validation strategies when evaluating topic models. This study applies two different methods for topic modeling to the same text corpus. It uses four validation strategies to assess how the choice of validation method affects the final model selection and evaluation. Our findings indicate that different approaches and methods lead to different model choices and evaluations, which is problematic. This might lead to unwanted results in case the choice of model has a decisive impact on findings and, consequently, on theory development and practical implications.
- Organisation(en)
- Institut für Publizistik- und Kommunikationswissenschaft, Forschungsgruppe Data Mining and Machine Learning
- Journal
- Computational Communication Research
- Band
- 5
- ISSN
- 2665-9085
- DOI
- https://doi.org/10.5117/CCR2023.1.13.BERN
- Publikationsdatum
- 01-2023
- Peer-reviewed
- Ja
- ÖFOS 2012
- 602011 Computerlinguistik
- Schlagwörter
- ASJC Scopus Sachgebiete
- Computational Theory and Mathematics, Linguistics and Language
- Link zum Portal
- https://ucrisportal.univie.ac.at/de/publications/d5618776-9547-4848-adb1-3e51ba006fbd