NU-InNet: Thai Food Image Recognition Using Convolutional Neural Networks on Smartphone

Chakkrit Termritthikun, Paisarn Muneesawang, Surachet Kanprachar


Currently, Convolutional Neural Networks (CNN) have been widely used in many applications. Image recognition is one of the applications utilizing CNN. For most of the research in this field, CNN is used mainly to increase the effectiveness of the recognition. However, the processing time and the amount of the parameters (or model size) are not taken into account as the main factors. In this paper, the image recognition for Thai food using a smartphone is studied. The processing time and the model size are reduced so that they can be properly used with smartphones. A new network called NUInNet (Naresuan University Inception Network) that adopts the concept of Inception module used in GoogLeNet is proposed in the paper. It is applied and tested with Thai food database called THFOOD-50, which contains 50 kinds of famousThai food. It is found that NU-InNet can reduce the processing time and the model size by the factors of 2 and 10, respectively, comparing to those obtained from GoogLeNet while maintaining the recognition precision to the same level as GoogLeNet. This significant reduction in the processing time and the model size using the proposed network can certainly satisfy users for Thai-food recognition application in a smartphone.


Deep Learning; Food Recognition; Convolutional Neural Networks; Smartphone; Thai Food; Dataset; Inception;

Full Text:



T. Maruyama, Y. Kawano, and K. Yanai, “Real-time mobile recipe recommendation system using food ingredient recognition,” in Proceedings of the 2nd ACM international workshop on Interactive multimedia on mobile and portable devices, 2012, pp. 27-34.

N. Tammachat and N. Pantuwong, “Calories analysis of food intake using image recognition,” in Information Technology and Electrical Engineering (ICITEE), 2014 6th International Conference on, 2014, pp. 1-4.

M. M. Anthimopoulos, L. Gianola, L. Scarnato, P. Diem, and S. G. Mougiakakou, “A food recognition system for diabetic patients based on an optimized bag-of-features model,” Biomedical and Health Informatics, IEEE Journal of, vol. 18, pp. 1261-1271, 2014.

Y. Kawano and K. Yanai, “Foodcam: A real-time food recognition system on a smartphone,” Multimedia Tools and Applications, pp. 1-25, 2015.

V. Bettadapura, E. Thomaz, A. Parnami, G. D. Abowd, and I. Essa, “Leveraging context to support automated food recognition in restaurants,” in 2015 IEEE Winter Conference on Applications of Computer Vision, 2015, pp. 580-587.

Y. Matsuda, H. Hoashi, and K. Yanai, “Recognition of multiple-food images by detecting candidate regions,” in Multimedia and Expo (ICME), 2012 IEEE International Conference on, 2012, pp. 25-30.

D. Mery and F. Pedreschi, “Segmentation of colour food images using a robust algorithm,” Journal of Food engineering, vol. 66, pp. 353-360, 2005.

Y.-W. Chang and Y.-Y. Chen, “An improve scheme of segmenting colour food image by robust algorithm,” Proc. Algo2006, 2006.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097-1105.

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” arXiv preprint arXiv:1512.03385, 2015.

K. Yanai and Y. Kawano, “Food image recognition using deep convolutional network with pre-training and fine-tuning,” in Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, 2015, pp. 1-6.

F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, and K. Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size,” arXiv preprint arXiv:1602.07360, 2016.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, et al., “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 675-678.

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” arXiv preprint arXiv:1512.00567, 2015.

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

ISSN: 2180-1843

eISSN: 2289-8131