Improving visual vocabularies: a more discriminative, representative and compact bag of visual words

Leonardo Chang, Airel Pérez-Suárez, José Hernández-Palancar, Miguel Arias-Estrada, L. Enrique Sucar


In this paper, we introduce three properties and their corresponding quantitative evaluation measures to assess the ability of a visual word to represent and discriminate an object class, in the context of the BoW approach. Also, based on these properties, we propose a methodology for reducing the size of the visual vocabulary, retaining those visual words that best describe an object class. Reducing the vocabulary will provide a more reliable and compact image representation. Our proposal does not depend on the quantization method used for building the set of visual words, the feature descriptor or the weighting scheme used, which makes our approach suitable to any visual vocabulary. Throughout the experiments we show that using only the most discriminative and representative visual words obtained by our proposed methodology improves the classification performance; the best results obtained with our proposed method are statistically superior to those obtained with the entire vocabularies. In the Caltech-101 dataset, average best results outperformed the baseline by a 4.6% and 4.8% in mean classification accuracy using SVM and KNN, respectively. In the Pascal VOC 2006 dataset there was a 3.2% and 7% improvement for SVM and KNN, respectively.Furthermore, these accuracy improvements were always obtained with more compact representations. Vocabularies 10 times smaller always obtained better accuracy results than the baseline vocabularies in the Caltech-101 dataset, and in the 78.1% of the experiments on the Pascal VOCdataset.

Full Text:



Concepts and applications of inferential statistics, 2013.

Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst., 110(3):346--359, 2008.

A Bosch, Andrew Zisserman, and X Munoz. Image classification using random forests and ferns. IEEE 11th International Conference on Computer Vision (2007), 23(1):1{8,007.

Siddhartha Chandra, Shailesh Kumar, and C. V. Jawahar. Learning hierarchical bag of words using naive bayes clustering. In Asian Conference on Computer Vision, pages 382--395, 2012.

Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, and Cdric Bray. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision, ECCV, pages 1--22, 2004.

Charles Elkan. Using the triangle inequality to accelerate k-means. In Tom Fawcett and Nina Mishra, editors, ICML, pages 147--153. AAAI Press, 2003.

Peter Emerson. The original borda count and partial voting. Social Choice and Welfare, 40(2):353--358, 2013.

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2011 (VOC2011) Results.

M. Everingham, A. Zisserman, C. K. I. Williams, and L. Van Gool. The PASCAL Visual Object Classes Challenge 2006 (VOC2006) Results.

Li Fei-Fei, Rob Fergus, and Pietro Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Comput. Vis. Image Underst., 106(1):59--70, April 2007.

Basura Fernando, lisa Fromont, Damien Muselet, and Marc Sebban. Supervised learning of gaussian mixture models for visual vocabulary generation. Pattern Recognition, 45(2):897--907, 2012.

Peter V. Gehler and Sebastian Nowozin. On feature combination for multiclass object classification. In ICCV, pages 221--228. IEEE, 2009.

Y. Gong, S. Kumar, H. A. Rowley, and S. Lazebnik. Learning binary codes for high-dimensional data using bilinear projections. In CVPR 2013, 2013.

H. Jegou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. Pattern Analysis and Machine Intellingence, 33(1):117--128, 2011.

Herv Jgou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell., 33(1):117--128, 2011.

Mingyuan Jiu, Christian Wolf, Christophe Garcia, and Atilla Baskurt. Supervised learning and codebook optimization for bag of words models. Cognitive Computation, 4:409--419, December 2012.

Kraisak Kesorn and Stefan Poslad. An enhanced bag-of-visual word vector space model to represent visual content in athletics images. IEEE Transactions on Multimedia, 14(1):211--222, 2012.

Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. 2006 IEEEComputer Society Conference on Computer Vision and Pattern Recognition Volume 2 CVPR06, 2(2169-2178):2169--2178, 2006.

Gang Liu. Improved bags-of-words algorithm for scene recognition. Journal of Computational Information Systems, 6(14):4933--4940, 2010.

Jingen Liu and Mubarak Shah. Learning human actions via information maximization. 2013 IEEE Conference on Computer Vision and Pattern Recognition, 0:1--8, 2008.

R.J. Lopez-Sastre, T. Tuytelaars, F.J. Acevedo-Rodriguez, and S. Maldonado-Bascon. Towards a more discriminative and semantic visual vocabulary. Computer Vision and Image Understanding, 115(3):415--425, 2011. Special issue on Feature-Oriented Image and Video Computing for Extracting Contexts and Semantics.

David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91--110, 2004.

Zhiwu Lu and Horace Ho-Shing Ip. Image categorization with spatial mismatch kernels. In CVPR, pages 397--404. IEEE, 2009.

Jianzhao Qin and Nelson Hon Ching Yung. Feature fusion within local region using localized maximum-margin learning for scene categorization. Pattern Recognition, 45(4):1671--1683, 2012.

Chih-Fong Tsai. Bag-of-words representation in image annotation: A review. ISRN Artificial Intelligence, 2012, 2012.

A. Vedaldi and A. Zisserman. Efficient additive kernels via explicit feature maps. Pattern Analysis and Machine Intellingence, 34(3), 2011.

Jianxin Wu, Wei-Chian Tan, and James M. Rehg. Efficient and effective visual codebook generation using additive kernels. Journal of Machine Learning Research, 2:3097--3118, 2011.

Shiliang Zhang, Qi Tian, Gang Hua, Qing-ming Huang, and Wen Gao. Generating descriptive visual words and visual phrases for large-scale image applications. IEEE Transactions on Image Processing, 20(9):2664--2677, 2011.

Shiliang Zhang, Qi Tian, Gang Hua, Wengang Zhou, Qingming Huang, Houqiang Li, and Wen Gao. Modeling spatial and semantic cues for large-scale near-duplicated image retrieval. Computer Vision and Image Understanding, 115(3):403--414, 2011.

Y. Zhang, J.Wu, and J. Cai. Compact representation for image classification: To choose or to compress? In CVPR 2014, 2014.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.