Dynamic Role-Aware Multi-Agent Reinforcement Learning for Multi-Objective Resource Allocation in R&D Institutions

Shishi You

Abstract


Resource allocation in new R&D institutions faces challenges such as diverging interests among multiple agents and dynamic resource supply and demand. Traditional methods struggle to adapt to these complex scenarios. This paper proposes a dynamic role-aware Multi-Agent Reinforcement Learning (MARL) collaborative decision-making algorithm for resource optimization. The algorithm constructs a resource system model encompassing human, material, financial, and information resources, and designs three innovative modules: dynamic role mapping, multi-objective hierarchical rewards, and real-time conflict resolution. Specifically, the MARL model adopts an improved Proximal Policy Optimization (PPO) framework integrated with an attention mechanism to prioritize key resource-task pairs (e.g., matching high-priority tasks with scarce GPU servers) and leverages a federated learning communication framework to reduce data transmission by 30% while ensuring information security. Dynamic role mapping adjusts agent roles (resource management, task execution, benefit coordination) in real time based on resource supply-demand deviations (e.g., switching resource management agents to auxiliary task execution during GPU surges) and task priorities. Multi-objective hierarchical rewards optimize benefits at the local (single-agent task completion), collaborative (multi-agent coordination), and global (system-wide utilization/cost) levels. Real-time conflict resolution rapidly resolves resource competition through game equilibrium (Nash equilibrium), completing reallocation within 10 seconds to avoid task delays. An experimental platform is built using Python 3.9 and PyTorch 2.0. Using operational data (2022–2024) from a provincial R&D institution (50 tasks, 20 resource types, 15 agents) and a synthetic dataset (generated via statistical distributions for generalizability), the algorithm is compared with Linear Programming (LP), Deep Q-Network (DQN), Multi-Agent Deep Deterministic Policy Gradient (MADDPG), QMIX, and FedMARL. Results show that the proposed algorithm achieves resource utilization of 94.2% ± 0.5% (95% confidence interval: 93.7–94.7%) in a single scenario, a 15.7 percentage point improvement over LP. The average task completion time is 28.5 days ± 1.2 days, a 36.9% reduction compared to LP. In dynamic scenarios, when resource fluctuations exceed ±10%, the average performance fluctuation is only 3.2%, a 74.4% reduction compared to LP. Ablation experiments show that removing dynamic role mapping reduces resource utilization by 6.2%, validating the module's effectiveness. This algorithm provides technical support for improving resource allocation efficiency in new R&D institutions.


Full Text:

PDF

References


Cheng, P., Chen, Y., Ding, M., Chen, Z., Liu, S., & Chen, Y. P. P. (2023). Deep reinforcement learning for online resource allocation in IoT networks: Technology, development, and future challenges. IEEE Communications Magazine, 61(6), 111-117. DOI: 10.1109/MCOM.001.2200526

Chen, Z., Hu, J., Min, G., Luo, C., & El-Ghazawi, T. (2021). Adaptive and efficient resource allocation in cloud datacenters using actor-critic deep reinforcement learning. IEEE Transactions on Parallel and Distributed Systems, 33(8), 1911-1923. DOI: 10.1109/TPDS.2021.3132422

Talaat, F. M. (2022). Effective deep Q-networks (EDQN) strategy for resource allocation based on optimized reinforcement learning algorithm. Multimedia Tools and Applications, 81(28), 39945-39961. https://doi.org/10.1007/s11042-022-13000-0

Qin, Y., Zhang, Z., Li, X., Huangfu, W., & Zhang, H. (2023). Deep reinforcement learning based resource allocation and trajectory planning in integrated sensing and communications UAV network. IEEE Transactions on Wireless Communications, 22(11), 8158-8169. DOI: 10.1109/TWC.2023.3260304

Fang, C., Xu, H., Yang, Y., Hu, Z., Tu, S., Ota, K., ... & Liu, Y. (2022). Deep-reinforcement-learning-based resource allocation for content distribution in fog radio access networks. IEEE Internet of Things Journal, 9(18), 16874-16883. DOI: 10.1109/JIOT.2022.3146239

Chen, J., Cao, X., Yang, P., Xiao, M., Ren, S., Zhao, Z., & Wu, D. O. (2022). Deep reinforcement learning based resource allocation in multi-UAV-aided MEC networks. IEEE Transactions on Communications, 71(1), 296-309. DOI: 10.1109/TCOMM.2022.3226193

Azimi, Y., Yousefi, S., Kalbkhani, H., & Kunz, T. (2021). Energy-efficient deep reinforcement learning assisted resource allocation for 5G-RAN slicing. IEEE Transactions on Vehicular Technology, 71(1), 856-871. DOI: 10.1109/TVT.2021.3128513

Yin, S., & Yu, F. R. (2021). Resource allocation and trajectory design in UAV-aided cellular networks based on multiagent reinforcement learning. IEEE Internet of Things Journal, 9(4), 2933-2943. DOI: 10.1109/JIOT.2021.3094651

Tran-Dang, H., Bhardwaj, S., Rahim, T., Musaddiq, A., & Kim, D. S. (2022). Reinforcement learning based resource management for fog computing environment: Literature review, challenges, and open issues. Journal of Communications and Networks, 24(1), 83-98. DOI: 10.23919/JCN.2021.000041

Jiang, Y., Kodialam, M., Lakshman, T. V., Mukherjee, S., & Tassiulas, L. (2021). Resource allocation in data centers using fast reinforcement learning algorithms. IEEE Transactions on Network and Service Management, 18(4), 4576-4588. DOI: 10.1109/TNSM.2021.3100460

Yuan, Y., Zheng, G., Wong, K. K., & Letaief, K. B. (2021). Meta-reinforcement learning based resource allocation for dynamic V2X communications. IEEE Transactions on Vehicular Technology, 70(9), 8964 - 8977. DOI: 10.1109/TVT.2021.3098854

Debroy, P., Smarandache, F., Majumder, P., Majumdar, P., & Seban, L. (2025). OPA-IF-Neutrosophic-TOPSIS Strategy under SVNS Environment Approach and Its Application to Select the Most Effective Control Strategy for Aquaponic System. Informatica, 36(1), 1-32. doi:10.15388/24-INFOR583

Shang, C., Sun, Y., Luo, H., & Guizani, M. (2023). Computation offloading and resource allocation in NOMA–MEC: A deep reinforcement learning approach. IEEE Internet of Things Journal, 10(17), 15464 - 15476.DOI: 10.1109/JIOT.2023.3264206

Hatefi, M. A. (2025). New Aggregation Multiple Attribute Methods Based on Indifference Threshold and Yearning Threshold Concepts. Informatica, 36(2), 337-367. doi:10.15388/24-INFOR580

Luong, P., Gagnon, F., Tran, L. N., & Labeau, F. (2021). Deep reinforcement learning-based resource allocation in cooperative UAV-assisted wireless networks. IEEE Transactions on Wireless Communications, 20(11), 7610-7625. DOI: 10.1109/TWC.2021.3086503

Tianqing, Z., Zhou, W., Ye, D., Cheng, Z., & Li, J. (2021). Resource allocation in IoT edge computing via concurrent federated reinforcement learning. IEEE Internet of Things Journal, 9(2), 1414 – 1426. DOI: 10.1109/JIOT.2021.3086910

Liu, T., Ni, S., Li, X., Zhu, Y., Kong, L., & Yang, Y. (2022). Deep reinforcement learning based approach for online service placement and computation resource allocation in edge computing. IEEE Transactions on Mobile Computing, 22(7), 3870-3881.DOI: 10.1109/TMC.2022.3148254

Liu, L., Feng, J., Mu, X., Pei, Q., Lan, D., & Xiao, M. (2023). Asynchronous deep reinforcement learning for collaborative task computing and on-demand resource allocation in vehicular edge computing. IEEE Transactions on Intelligent Transportation Systems, 24(12), 15513-15526. DOI: 10.1109/TITS.2023.3249745

Ju, Y., Chen, Y., Cao, Z., Liu, L., Pei, Q., Xiao, M., ... & Leung, V. C. (2023). Joint secure offloading and resource allocation for vehicular edge computing network: A multi-agent deep reinforcement learning approach. IEEE Transactions on Intelligent Transportation Systems, 24(5), 5555-5569. DOI: 10.1109/TITS.2023.3242997

Huang, J., Wan, J., Lv, B., Ye, Q., & Chen, Y. (2023). Joint computation offloading and resource allocation for edge-cloud collaboration in internet of vehicles via deep reinforcement learning. IEEE Systems Journal, 17(2), 2500-2511. DOI: 10.1109/JSYST.2023.3249217




DOI: https://doi.org/10.31449/inf.v49i23.11452

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.