Research on Automatic Sharding and Load Balancing of NoSQL Databases Based on Twin Delayed Deep Deterministic Policy Gradient (TD3)

Guojun Wang, Jian Yin

Abstract


With the rapid development of cloud computing and big data technology, NoSQL databases face severe challenges of sharding and load balancing in dynamic load scenarios with massive amounts of data. Traditional policies rely on static rules or threshold mechanisms, making it difficult to adapt to sudden traffic fluctuations and data distribution skews, resulting in frequent hotspot fragmentation, increased cross-node query latency, and uneven resource utilization. In this study, a dynamic optimization framework based on deep reinforcement learning is proposed, which collects multi-dimensional indicators such as cluster node load, network latency, and query mode in real time (covering 9 core parameters such as sharded data volume, node CPU/memory utilization, disk I/O, query latency, etc.), constructs the state space through Min-Max normalization and weighted fusion to characterize the global characteristics of the system, and designs a composite reward function including throughput reward, response time reward, and migration cost penalty. Achieve a multi-objective optimization balance. In the Cassandra cluster experiment, an open-source distributed database, the YCSB benchmark is used to simulate a mixed-load and burst traffic scenario, and compared with the traditional consistent hash and weighted polling strategy, the method reduces the incidence of hotspot sharding by 42% (from 23.7% to 13.8%), the average query latency is reduced by 35% (optimized from 152ms to 99ms), and the data migration amount is reduced by 28% compared with the threshold triggering mechanism in a 10-node cluster. By introducing the Twin delayed deep deterministic policy gradient (TD3) algorithm, the agent can effectively avoid local optimum when dynamically adjusting the shard boundary and request routing, and the standard deviation of system throughput is reduced by 61% (112ops vs 289ops) compared with the traditional method in the 24-hour traffic fluctuation test. After training on 500,000 steps, the algorithm converges 2.3 times faster than traditional DQN, and the long-term return is increased by 19%. Experimental results show that the reinforcement learning-driven strategy significantly improves the resource utilization and service quality of the cluster, and provides a new technical path for the autonomous management of databases in complex dynamic environments.


Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v49i28.10320

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.