Tuning Latent Dirichlet Allocation Parameters using Ant Colony Optimization

Thanakorn Yarnguy, Wanida Kanarkard

Abstract


Latent Dirichlet Allocation is a famous and commonly used model used to find hidden topic and apply in many text analysis research. To improve the performance of LDA, two Dirichlet prior parameters, namely the α and the β, that has an effect on the performance of the system are utilized. Accordingly, they must be set to an appropriate value. Ant colony optimization has the ability to solve the computational problem by adding parameters tuning. Thus, we proposed to implement an approach to find the optimal parameters α and β for LDA by using Ant colony optimization. An evaluation using dataset from the UCI (KOS, NIPS, ENRON) that are the standards for estimating topic model was conducted. The results of the experiment show that LDA, which has tuning parameters by ACO has better performance when it is evaluated by perplexity score.

Keywords


Ant Colony Optimization; Latent Dirichlet Allocation; Tuning Parameters;

Full Text:

PDF

References


Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.

Newman, David, et al. "Analyzing entities and topics in news articles using statistical topic models." Intelligence and Security Informatics (2006): 93-104.

Mimno, David, and Andrew McCallum. "Organizing the OCA: learning faceted subjects from a library of digital books." Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries. ACM, 2007.

Dumais, Susan T. "Latent semantic analysis." Annual review of information science and technology 38.1 (2004): 188-230.

Hasanpour, E. H. "PSO algorithm for text clustering based on latent semantic indexing." The Fourth Iran Data Mining Conference. Tehran, Iran. 2010.

Panichella, Annibale, et al. "How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms." Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 2013.

Liu, Yang, and Kevin M. Passino. "Swarm intelligence: Literature overview." Department of Electrical Engineering, the Ohio State University (2000).

Dorigo, Marco, and Gianni Di Caro. "Ant colony optimization: a new meta-heuristic." Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on. Vol. 2. IEEE, 1999.

Hofmann, Thomas. "Probabilistic latent semantic indexing." Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 1999.

Alwan, Hiba Basim, and Ku Ruhana Ku-Mahamud. "Solving SVM model selection problem using ACOR and IACOR." WSEAS Transactions on Computers: 277-288.

Kuzmenko, Andrey. "Simulated Annealing for Dirichlet Priors in LDA." (2014).

Latha, K., and R. Rajaram. "An Efficient LSI based Information Retrieval Framework using Particle swarm optimization and simulated annealing approach." Advanced Computing and Communications, 2008. ADCOM 2008. 16th International Conference on. IEEE, 2008.

Zhang, XiaoLi, et al. "A grid-based ACO algorithm for parameters optimization in support vector machines." Granular Computing, 2008. GrC 2008. IEEE International Conference on. IEEE, 2008.


Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN: 2180-1843

eISSN: 2289-8131