Vision Transformer-Based Framework for AI-Generated Image Detection in Interior Design
Abstract
Increasingly, images generated by artificial intelligence (AI) are being used within interior design as a source of authenticity and ethical use. Based on limited Convolutional Neural Network (CNN) capabilities in data descriptive processes, including long-range dependencies and global patterns, this study examines how Vision Transformer (ViT) can be utilized in detecting AI-generated interior design images. We finetuned and evaluated four ViT models, ViT-B16, ViT-B32, ViT-L16, and ViT-L32, on 1,000 samples per class dataset. Accuracy, precision, recall, F1-score, and computational efficiency were used to assess performance. Results show that models with smaller patch sizes (i.e., 16×16) perform better than larger ones (i.e., 32×32). It was found that ViT-B16 and ViT-L16 had the highest accuracy (96.25%) and F1- score (0.9625) in identifying minor inconsistencies in the AI-generated images. ViT-B32 and ViT-L32 enjoy better computational efficiency based on lower classification performance (80.00% and 81.25% accuracy, respectively, for ViT-B32 and ViT-L32). The best tradeoff between accuracy and resource efficiency turns out to be ViT-B16. However, computational costs were higher with ViT — ViT-L16, although just as accurate. Computationally, ViT-B32 and ViT-L32 were also efficient, which was more appropriate for realtime applications with lower accuracy than speed. Through this work, we contribute a domain-specific deep learning framework for AI-generated image detection in interior design to increase authenticity verification. Future work will address improving computational efficiency and generalizing the model across all (or most) generative models and design styles.
Full Text:
PDFReferences
J. MARTIN DIEZ DE OÑATE, "Industrial Design and AI: how generative artificial intelligence can help the designer in the early stages of a project," 2022.
S. K. Alhabeeb and A. A. Al-Shargabi, "Text-to-Image Synthesis With Generative Models: Methods, Datasets, Performance Metrics, Challenges, and Future Direction," IEEE Access, 2024.
A. Spielberg et al., "Differentiable visual computing for inverse problems and machine learning," Nature Machine Intelligence, vol. 5, no. 11, pp. 1189-1199, 2023.
Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time series," The handbook of brain theory and neural networks, vol. 3361, no. 10, p. 1995, 1995.
A. Dosovitskiy, "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
J. Hutson, J. Lively, B. Robertson, P. Cotroneo, and M. Lang, Creative Convergence: The AI Renaissance in Art and Design. Springer Nature, 2023.
W. Vivaldi and I. Sutedja, "Using Deep Learning and Cbir To Address Copyright Concerns of AI-Generated Art: A Systematic Literature Review," Devotion: Journal of Research and Community Service, vol. 5, no. 10, pp. 1320-1330, 2024.
H. Jo, J.-K. Lee, Y.-C. Lee, and S. Choo, "Generative artificial intelligence and building design: early photorealistic render visualization of façades using local identity-trained models," Journal of Computational Design and Engineering, vol. 11, no. 2, pp. 85-105, 2024.
B. Çeken and B. Akgöz, "THE IMPACT OF ARTIFICIAL INTELLIGENCE ON DESIGN: THE EXAMPLE OF DALL-E," Sanat ve Tasarım Dergisi, vol. 14, no. 1, pp. 374-397, 2024.
A. Jaruga-Rozdolska, "Artificial intelligence as part of future practices in the architect's work: MidJourney generative tool as part of a process of creating an architectural form," Architectus, no. 3 (71, pp. 95-104, 2022.
L. Zhang, A. Rao, and M. Agrawala, "Adding conditional control to text-to-image diffusion models," in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 3836-3847.
P. W. Tien, S. Wei, J. Darkwa, C. Wood, and J. K. Calautit, "Machine learning and deep learning methods for enhancing building energy efficiency and indoor environmental quality–a review," Energy and AI, vol. 10, p. 100198, 2022.
V. Nain, H. S. Shyam, N. Kumar, P. Tripathi, and M. Rai, "A Study on Object Detection Using Artificial Intelligence and Image Processing–Based Methods," Mathematical Models Using Artificial Intelligence for Surveillance Systems, pp. 121-148, 2024.
S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros, "CNN-generated images are surprisingly easy to spot... for now," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8695-8704.
D. Gragnaniello, D. Cozzolino, F. Marra, G. Poggi, and L. Verdoliva, "Are GAN generated images easy to detect? A critical analysis of the state-of-the-art," in 2021 IEEE international conference on multimedia and expo (ICME), 2021: IEEE, pp. 1-6.
S. Niu, B. Li, X. Wang, and H. Lin, "Defect image sample generation with GAN for improving defect recognition," IEEE Transactions on Automation Science and Engineering, vol. 17, no. 3, pp. 1611-1622, 2020.
S. Paul and P.-Y. Chen, "Vision transformers are robust learners," in Proceedings of the AAAI conference on Artificial Intelligence, 2022, vol. 36, no. 2, pp. 2071-2081.
S. Jagatheesaperumal, S. Gaftandzhieva, and R. Doneva, "An Overview of Vision Transformers for Image Processing: A Survey."
S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, "Transformers in vision: A survey," ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1-41, 2022.
M. A. Arshed, A. Alwadain, R. Faizan Ali, S. Mumtaz, M. Ibrahim, and A. Muneer, "Unmasking Deception: Empowering Deepfake Detection with Vision Transformer Network," Mathematics, vol. 11, no. 17, p. 3710, 2023.
A. S. Paladugu, A. Deodeshmukh, A. R. Shekatkar, I. Kandasamy, and V. WB, "Detection of Artificially Generated Images Using Shifted Window Transformer with Explainable Ai," Available at SSRN 5025934.
E. Essa, "Feature fusion Vision Transformers using MLP-Mixer for enhanced deepfake detection," Neurocomputing, vol. 598, p. 128128, 2024.
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, "Training data-efficient image transformers & distillation through attention," in International conference on machine learning, 2021: PMLR, pp. 10347-10357.
J. Maurício, I. Domingues, and J. Bernardino, "Comparing vision transformers and convolutional neural networks for image classification: A literature review," Applied Sciences, vol. 13, no. 9, p. 5521, 2023.
K. Han et al., "A survey on vision transformer," IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 1, pp. 87-110, 2022.
J. Wang et al., "Objectformer for image manipulation detection and localization," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2364-2373.
H. Fan et al., "Multiscale vision transformers," in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 6824-6835.
C. Zhao, C. Wang, G. Hu, H. Chen, C. Liu, and J. Tang, "ISTVT: interpretable spatial-temporal video transformer for deepfake detection," IEEE Transactions on Information Forensics and Security, vol. 18, pp. 1335-1348, 2023.
J. M. Johnson and T. M. Khoshgoftaar, "Survey on deep learning with class imbalance," Journal of big data, vol. 6, no. 1, pp. 1-54, 2019.
R. Sauber-Cole and T. M. Khoshgoftaar, "The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey," Journal of Big Data, vol. 9, no. 1, p. 98, 2022.
C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning," Journal of big data, vol. 6, no. 1, pp. 1-48, 2019.
A. J. S. Kumar et al., "Evaluation of generative adversarial networks for high-resolution synthetic image generation of circumpapillary optical coherence tomography images for glaucoma," JAMA ophthalmology, vol. 140, no. 10, pp. 974-981, 2022.
DOI: https://doi.org/10.31449/inf.v49i16.7979

This work is licensed under a Creative Commons Attribution 3.0 License.