A Semi-Supervised Approach to Monocular Depth Estimation, Depth Refinement, and Semantic Segmentation of Driving Scenes using a Siamese Triple Decoder Architecture
Abstract
Full Text:
PDFReferences
L. Chen, Z. Yang, J. Ma, and Z. Luo (2018) Driving Scene Perception Network: Real-time Joint Detection, Depth Estimation and Semantic Segmentation, Proceedings of the IEEE Winter Conference on Applications of Computer Vision, IEEE, pp. 1283-1291. https://doi.org/10.1109/WACV.2018.00145.
G. Giannone and B. Chidlovskii (2019) Learning Common Representation from RGB and Depth Images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, IEEE.
R. Cipolla, Y. Gal and A. Kendall (2018) Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 7482-7491. https://doi.org/10.1109/CVPR.2018.00781.
J. Liu, Y.Wang, Y. Li, J. Fu, J. Li, and H. Lu (2018) Collaborative Deconvolutional Neural Networks for Joint Depth Estimation and Semantic Segmentation, IEEE Transactions on Neural Networks and Learning Systems, IEEE, vol. 29, no. 11, pp. 5655-5666. https://doi.org/10.1109/TNNLS.2017.2787781.
D. Sanchez-Escobedo, X. Lin, J. R. Casas, and M. Pardas (2018) Hybridnet for Depth Estimation and Semantic Segmentation, Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, pp. 1563-1567. https://doi.org/10.1109/ICASSP.2018.8462433.
Peng Wang, Xiaohui Shen, Zhe Lin, S. Cohen, B. Price, and A. Yuille (2015) Towards unified depth and semantic prediction from a single image, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 2800-2809. https://doi.org/10.1109/CVPR.2015.7298897.
B. Liu, S. Gould, and D. Koller (2010) Single image depth estimation from predicted semantic labels, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1253-1260. https://doi.org/10.1109/CVPR.2010.5539823.
L. Ladicky, J. Shi, and M. Pollefeys (2014) Pulling things out of perspective, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 89-96. https://doi.org/10.1109/CVPR.2014.19.
C. Hazirbas, L. Ma, C. Domokos, and D. Cremers (2016) Fusenet: Incorporating depth into semantic segmentation via fusion-based cnn architecture, Proceedings of the Asian Conference on Computer Vision, Springer, pp. 213-228. https://doi.org/10.1007/978-3-319-54181-5_14.
O. H. Jafari, O. Groth, A. Kirillov, M. Y. Yang, and C. Rother (2017) Analyzing modular CNN architectures for joint depth prediction and semantic segmentation, Proceedings of the 2017 International Conference on Robotics and Automation, IEEE, pp. 4620-4627. https://doi.org/10.1109/ICRA.2017.7989537.
V. Nekrasov, T. Dharmasiri, A. Spek, T. Drummond, C. Shen and I. Reid (2019) Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations, Proceedings of the 2019 International Conference on Robotics and Automation, IEEE, pp. 7101-7107. https://doi.org/10.1109/ICRA.2019.8794220.
A. Mousavian, H. Pirsiavash, and J. Kosecka (2019) Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks, Proceedings of the 2016 Fourth International Conference on 3D Vision, IEEE, pp. 611-619. https://doi.org/10.1109/3DV.2016.69.
P. Z. Ramirez, M. Poggi, F. Tosi, S. Mattoccia, and L. Di Stefano (2018) Geometry meets semantic for semi-supervised monocular depth estimation, Proceedings of the 14th Asian Conference on Computer Vision, Springer, pp. 611-619. https://doi.org/10.1007/978-3-030-20893-6_19.
C. Godard, O. M. Aodha and G. J. Brostow (2017) Unsupervised Monocular Depth Estimation with Left-Right Consistency, Proceedings of the IEEE Conference on Computer Vision and Pat-tern Recognition, IEEE, pp. 6602-6611. https://doi.org/10.1109/CVPR.2017.699.
J. P. Yusiong and P. Naval (2019) AsiANet: Autoencoders in Autoencoder for Unsupervised Monocular Depth Estimation, Proceedings of the IEEE Winter Conference on Applications of Computer Vision, IEEE, pp. 443-451. https://doi.org/10.1109/WACV.2019.00053.
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error measurement to structural similarity, IEEE Transactions on Image Processing, IEEE, vol. 13, no. 4, pp. 600-612.
M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu (2015) Spatial transformer networks, Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 2017-2025.
M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele (2016) The cityscapes dataset for semantic urban scene understanding, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 3213-3223. https://doi.org/10.1109/CVPR.2016.350.
Geiger, P. Lenz, and R. Urtasun (2012) Are we ready for autonomous driving? The kitti vision benchmark suite, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 3354-3361. https://doi.org/10.1109/CVPR.2012.6248074.
M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard et al. (2016) Tensorflow: a system for large-scale machine learning, Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, USENIX Association, pp. 265-283.
D. Kingma and J. Ba (2015) Adam: A method for stochastic optimization, Proceedings of the International Conference on Learning Representations.
T. Zhou, M. Brown, N. Snavely, and D. G. Lowe (2017) Unsupervised learning of depth and ego-motion from video, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 6612-6619. https://doi.org/10.1109/CVPR.2017.700.
R. Mahjourian, M. Wicke, and A. Angelova (2018) Unsupervised learning of depth and ego-motion from monocular video using 3d geometric constraints, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 5667-5675. https://doi.org/10.1109/CVPR.2018.00594.
Z. Yin and J. Shi (2018) GeoNet: Unsupervised learning of dense depth, optical flow and camera pose, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1983-1992. https://doi.org/10.1109/CVPR.2018.00212.
DOI: https://doi.org/10.31449/inf.v44i4.3018
This work is licensed under a Creative Commons Attribution 3.0 License.