Motion Embedded Images: An Approach to Capture Spatial and Temporal Features for Action Recognition
Abstract
Full Text:
PDFReferences
Carreira, J. and Zisserman, A. (2017). Quovadis, action recognition? A new model and the kinetics dataset. CoRR, abs/1705.07750.
Feichtenhofer, C. (2020). X3D: expanding architectures for efficient video recognition. CoRR, abs/2004.04730.
Feichtenhofer, C., Fan, H., Malik, J., and He, K. (2018). Slowfast networks for video recognition. CoRR, abs/1812.03982.
Feichtenhofer, C., Pinz, A., and Zisserman, A. (2016). Convolutional two-stream network fusion for video action recognition. CoRR, abs/1604.06573.
Goyal, R., Kahou, S. E., Michalski, V., Materzynska, J., Westphal, S., Kim, H., Haenel, V., Fr ̈und, I., Yianilos, P., Mueller-Freitag, M., Hoppe, F., Thurau, C., Bax, I., and Memisevic, R. (2017). The ”something something” video database for learning and evaluating visual common sense. CoRR, abs/1706.04261.
Han, C., Wang, C., Mei, E., Redmon, J., Divvala, S. K., Wu, Z., Wang, X., Jiang, Y.-G., Ye, H., and Xue, X. (2017). Yolo-based adaptive window two-stream convolutional neural network for video classification.
Hara, K., Kataoka, H., and Satoh, Y. (2017). Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? CoRR, abs/1711.09577.
Heilbron, F. C., Escorcia, V., Ghanem, B., and Niebles, J. C. (2015). Activitynet: A large-scale video benchmark for human activity understanding. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 961–970.
Heng Wang, Alexander Kl ̈aser, C. S. L. C.-L. (2011). Action recognition by dense trajectories.
Ji, S., Xu, W., Yang, M., and Yu, K. (2013). 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):221–231.
Kalfaoglu, M. E., Kalkan, S., and Alatan, A. A. (2020). Late temporal modeling in 3d cnn architectures with bert for action recognition.
Karpathy, A., Toderici, G., Shetty, S., Le-ung, T., Sukthankar, R., and Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In 2014 IEEE Conference on Computer vision and Pattern Recognition, pages 1725–1732.
Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., Viola, F.,Green, T., Back, T., Natsev, P., Suleyman, M.,and Zisserman, A. (2017). The kinetics human action video dataset. CoRR, abs/1705.06950.
Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Pereira, F.,Burges, C. J. C., Bottou, L., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 25, pages 1097–1105. Curran Associates, Inc.
Laptev and Lindeberg (2003). Space-time interest points. In Proceedings Ninth IEEE International Conference on Computer Vision pages 432–439 vol.1.
Lin, J., Gan, C., and Han, S. (2018). Temporal shift module for efficient video understanding. CoRR, abs/1811.08383.
Ng, J. Y., Choi, J., Neumann, J., and Davis, L. S. (2016). Actionflownet: Learning motion representation for action recognition. CoRR, abs/1612.03052.
Ng, J. Y., Hausknecht, M. J., Vijayanarasimhan, S., Vinyals, O., Monga, R., and Toderici, G. (2015). Beyond short snippets: Deep networks for video classification. CoRR, abs/1503.08909.
Rodriguez, M. D., Ahmed, J., and Shah, M. (2008). Action mach a spatio-temporal maximum average correlation height filter for action recognition. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 18.
Simonyan, K. and Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. CoRR, abs/1406.2199.
Soomro, K., Zamir, A. R., and Shah, M. (2012). UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, abs/1212.0402.
Wang, L., Xiong, Y., Wang, Z., and Qiao, Y. (2015). Towards good practices for very deep two-stream convnets. CoRR, abs/1507.02159.
Zach, C., Pock, T., and Bischof, H. (2007). A duality based approach for realtime tv-l1 optical flow. volume 4713, pages 214–223.
Zhang, B., Wang, L., Wang, Z., Qiao, Y., and Wang, H. (2016). Real-time action recognition with enhanced motion vector cnns. CoRR, abs/1604.07669.
Zhu, Y., Lan, Z., Newsam, S. D., and Hauptmann, A. G. (2017). Hidden two-stream convolutional networks for action recognition. CoRR, abs/1704.00389.
DOI: https://doi.org/10.31449/inf.v47i3.4755
This work is licensed under a Creative Commons Attribution 3.0 License.