GEAR: A Counterfactual Multi-Agent Reinforcement Learning Framework for Strategic Resource Allocation in Game Recommendation Systems
Abstract
As the world's game market is still experiencing its explosive expansion, personalized recommender systems have become a necessity in order to enhance player experience and platform stickiness. However, conventional recommendation models, which often try to maximize short-term interaction measures like click-through rates, inevitably lead to content homogenization and degrade the diversity of the content ecosystem. This myopic focus ultimately undermines long-term player retention. To address this root difficulty, this thesis suggests a new paradigm that recasts the recommendation issue as from what to recommend to how to recommend strategically. This work innovatively models a game recommendation platform, which is composed of multiple scenarios, as a cooperative multi-agent system where every scenario is an agent. They have the common objective of optimizing the long-term ecosystem health of the platform. To this end, we design and implement a novel multi-agent reinforcement learning algorithm, GEAR (Guild-based Ecosystem-aware Allocation of Resources). We model this problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and solve it using GEAR, which is based on the Centralized Training with Decentralized Execution (CTDE) framework and features a novel counterfactual credit assignment mechanism. GEAR is run under the Centralized Training with Decentralized Execution (CTDE) framework. Its main innovation is a counterfactual-based credit assignment mechanism, enabling each agent to accurately assess its marginal contribution to a global, long-term utility function—a composite measure of player retention, content diversity, and user engagement. This mechanism effectively resolves the non-stationarity and credit assignment problems inherent in multi-agent learning. We conduct thorough experiments in a purpose-built simulated game recommendation platform. The results demonstrate that GEAR significantly outperforms static policies, independent learners, and state-of-the-art multi-agent baselines, including MADDPG and QMIX, on all key long-term metrics. Ablation studies also validate the critical contribution of the counterfactual mechanism to the algorithm's stability and performance. Furthermore, GEAR exhibits commendable strategic flexibility, intelligently altering its resource allocation policy to fit in with dynamic shifts in platform objectives. This research lays out both a novel theoretical framework and an effective technical methodology for the development of the next generation of self-managing, ecosystem-aware smart recommender systems.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i31.10220
This work is licensed under a Creative Commons Attribution 3.0 License.








