Spatiotemporal Attention-Based Multimodal VR-Real Public Opinion Dynamics Modelling in Adolescents
Abstract
With the popularization of VR technology among youths, public opinion dissemination in virtual social networks is characterized by spatio-temporal immersion, behavioural impulsiveness, and virtual-reality interaction. Traditional opinion models (e.g., SEIR), limited by unimodal modelling, struggle to capture the complex evolution laws of group polarization and virtual-reality linkage in VR environments. We propose the "Multimodal Virtual-Real Interaction Public Opinion Simulation Model Driven by Spatio-Temporal Attention Mechanism" (MSTA-VRE) to address this. By constructing a Heterogeneous Spatio-Temporal Graph Network (Hetero-STGNN) with a cross-modal Transformer, we fuse multi-source data (text, motion, voice, and physiological signals) to quantify the bidirectional penetration effect between virtual and real social nodes. Adversarial generative training and a causal interpretable module are introduced to enhance the model's robustness. Experiments show that compared with unimodal models, multimodal fusion reduces prediction error by 18%, maintains opinion recognition accuracy above 85% under malicious interference, and improves the recall rate of cross-domain opinion events by 41%. The model outperforms traditional SEIR models by reducing prediction error by 25% in similar scenarios. For instance, in a scenario with high-frequency malicious interference, our model maintained an opinion recognition accuracy of 87%, significantly higher than the 65% achieved by traditional models. This framework provides a full-chain solution—from theoretical modelling to dynamic intervention—for analyzing the evolution of youth VR social opinion and building a safe, controllable metaverse social ecology.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i22.10367
This work is licensed under a Creative Commons Attribution 3.0 License.








