Smart Task Scheduling for Cloud-Based Big Data Systems

Nagham Ajeel Sultan, Wael Hadeed, Dhuha Abdullah

Abstract


This paper presents a hybrid task scheduling framework for cloud-based big data systems that aims at three main objectives: to improve the system's performance, to decrease the expenses, and to increase the energy efficiency. The conceived system combines a rule-based decision engine with a Long Short-Term Memory (LSTM)-based resource prediction model, enabling real-time job assignment based on task urgency, data locality, and system state. The framework is at the top of Apache YARN; thus, it is compatible with batch jobs (via Hadoop/Spark) as well as streaming tasks (via Kafka/Flink). We reproduced the experiments on a 50-node cluster (n2-standard-16 instances, 16 vCPUs, 64 GB RAM), using real workloads of 100 GB–1 TB batch jobs and 1K–5K event/sec streams. Some of the metrics for evaluating the performance of the experiments are job completion time, throughput, cost per TB processed, and energy consumption (Joules/TB). The results indicate a 32–50% improvement in performance, up to 54% savings in cost when using spot instances, and a 25% reduction in energy consumption compared to baseline schedulers such as YARN, Kubernetes, and Spark.


Full Text:

PDF

References


Abueid, Aws I. "Big Data and Cloud Computing Opportunities and Application Areas." Engineering, Technology & Applied Science Research 14, no. 3 (2024): 14509-14516.

Berisha, Blend, Endrit Mëziu, and Isak Shabani. "Big data analytics in Cloud computing: an overview." Journal of Cloud Computing 11, no. 1 (2022): 24.

Zhang, Guo. "Cloud computing convergence: integrating computer applications and information management for enhanced efficiency." Frontiers in Big Data 8 (2025): 1508087.

Buyya, Rajkumar, Kotagiri Ramamohanarao, Chris Leckie, Rodrigo N. Calheiros, Amir Vahid Dastjerdi, and Steve Versteeg. "Big data analytics-enhanced cloud computing: Challenges, architectural elements, and future directions." In 2015 IEEE 21st International Conference on Parallel and Distributed Systems (ICPADS), pp. 75-84. IEEE, 2015.

Khan, Imran. "A study of big data in cloud computing." Computer Assisted Methods in Engineering and Science 31, no. 3 (2024).

Huang, Siqi, Zhenqiang Xie, Jiaxiang Wang, Penghui Lv, and Wenrong Wang. "Design and implementation of big data processing system based on Hadoop." Procedia Computer Science 259 (2025): 1115-1122.

Arif, Zeravan, and Subhi RM Zeebaree. "Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management." The Indonesian Journal of Computer Science 13, no. 2 (2024).

Zhu, Wenbo. "Optimizing distributed networking with big data scheduling and cloud computing." In International Conference on Cloud Computing, Internet of Things, and Computer Applications (CICA 2022), vol. 12303, pp. 23-28. SPIE, 2022.

Dai, Fei, Md Akbar Hossain, and Yi Wang. "State of the art in parallel and distributed systems: Emerging trends and challenges." Electronics 14, no. 4 (2025): 677.

Arif, Zeravan, and Subhi RM Zeebaree. "Distributed Systems for Data-Intensive Computing in Cloud Environments: A Review of Big Data Analytics and Data Management." The Indonesian Journal of Computer Science 13, no. 2 (2024).

Ilager, Shashikant, Rajeev Muralidhar, and Rajkumar Buyya. "Artificial intelligence (ai)-centric management of resources in modern distributed computing systems." In 2020 IEEE Cloud Summit, pp. 1-10. IEEE, 2020.

Tuli, Shreshth, Fatemeh Mirhakimi, Samodha Pallewatta, Syed Zawad, Giuliano Casale, Bahman Javadi, Feng Yan, Rajkumar Buyya, and Nicholas R. Jennings. "AI augmented Edge and Fog computing: Trends and challenges." Journal of Network and Computer Applications 216 (2023): 103648.

Singh, Sukhpreet, and Jaspreet Kaur. "Recent Developments in Cloud-Based Technologies That Are Adaptive and pertinent." Advancements in Cloud-Based Intelligent Informative Engineering (2025): 95-114.

Tuli, Shreshth, Redowan Mahmud, Shikhar Tuli, and Rajkumar Buyya. "Fogbus: A blockchain-based lightweight framework for edge and fog computing." Journal of Systems and Software 154 (2019): 22-36.

Perera, Niranda, Arup Kumar Sarker, Kaiying Shan, Alex Fetea, Supun Kamburugamuve, Thejaka Amila Kanewala, Chathura Widanage et al. "Supercharging distributed computing environments for high-performance data engineering." Frontiers in High Performance Computing 2 (2024): 1384619.




DOI: https://doi.org/10.31449/inf.v49i28.10530

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.